Sitemaps are a valuable tool in helping search engines crawl your website. In this post I’ll show you how you can have a dynamically generated sitemap for a WordPress blog hosted on Google App Engine.
Because of the read-only nature of the file system on App Engine, we’ll use Google Cloud Storage to store the generated sitemap file. We’ll also add some handlers to serve the sitemap, and to update it as part of a scheduled cron job.
Prerequisites
We’ll assume that you already have a WordPress site up and running on Google App Engine. If you don’t, you can find more information here. We’ll also assume that you have installed the Google App Engine WordPress plugin and have it correctly configured to upload media files to a Google Cloud Storage Bucket.
Install a Sitemap Generator Plugin
For this tutorial we’re going to use the Google XML Sitemaps plugin to generate our dynamic sitemap – it has plenty of configuration options that make it easy for us to set it up to work well on App Engine.
To install the plugin:
- Download the plugin to your local development environment.
- Unzip the plugin to the wordpress/wp-content/plugins folder of your application.
- Use appcfg.py to upload the new files to your App Engine application.
$ appcfg.py -A your_application_name update .
- Go to the WordPress administration console for your site and enable the ‘Google XML Sitemaps’ plugin.
Now the the plugin is installed and enabled we need to configure it. We’ll need to know the name of the Google Cloud Storage bucket that we are using to store uploads for the application. If you forget, you can check the configuration of the App Engine plugin which will have the name of the bucket, like in the screen shot below – where the bucket name is gae-php-tips-blog.
Armed with the knowledge of the bucket name that we’ll be saving the sitemaps file into, open the settings page for the ‘Google XML Sitemaps’ plugin. Make the following changes in the settings page:
In the ‘Basic Options’ section:
- Uncheck ‘Write a gzipped file’.
- Uncheck ‘Rebuild sitemap if you change the content of your blog’.
- Check ‘Enable manual sitemap building via GET request’.
- Uncheck ‘Add sitemap URL to the virtual robots.txt file’.
We’ll use a custom location for the sitemap.xml file, which we will write to the Google Cloud Storage bucket for your WordPress blog.
In the ‘Location of your sitemap file’ section:
- Select ‘Custom Location’.
- Set the absolute path to ‘gs://your_cloud_storage_bucket/sitemap.xml’ where your_cloud_storage_bucket is the name of your Cloud Storage Bucket.
After this, your settings should look similar to the screen shots below.
The plugin should now be ready to create a sitemap.xml file for you site and save it to Google Cloud Storage.
Create the sitemap.xml file in Google Cloud Storage
One of the prerequisites of the Google XML Sitemaps plugin is that the file sitemap.xml already exists in the target location. This is because it calls is_file_writable
before writing the file and if it doesn’t exist the call will fail. To pre-create the sitemap.xml file we can either use the Cloud Console, gsutil or a small PHP script deployed with your application (i’ll show you how later on). If you’re savvy with gsutil you can get the job done from the command line. First we’ll create a sitemap.xml file and then we’ll copy it to our bucket.
$ touch sitemap.xml $ gsutil cp sitemap.xml gs://<your_bucket_name>/sitemap.xml
Naturally change “your_bucket_name” to the name of your actual bucket in the command.
Create a request handler for sitemap.xml
When a web crawler accesses the url /sitemap.xml on your WordPress site, we want to return the file stored in Google Cloud Storage. To do this, we’ll add a short PHP script to handle these requests.
In the root folder of your application create a file called sitemap.php and copy the following content into it.
<?php require_once 'wordpress/wp-load.php'; require_once 'google/appengine/api/cloud_storage/CloudStorageTools.php'; use google\appengine\api\cloud_storage\CloudStorageTools; // Get the name of the bucket configured in the App Engine plugin. $bucket = get_option('appengine_uploads_bucket', ''); if ($bucket) { CloudStorageTools::serve('gs://' . $bucket . '/sitemap.xml', ['content_type' => 'text/xml']); }
This script uses the CloudStorageTools::serve
method to send the content of the Google Cloud Storage file as the response to the request.
Now we can add a handler in our app.yaml file for requests to /sitemap.xml
. Open your app.yaml
file and add the following
- url: /sitemap.xml script: sitemap.php
Note: This must go above the catch-all handler in the app.yaml
file.
Use CRON to create sitemaps.
We’ll use cron to automatically update the sitemap file every day. This helps us get around the 60 second request timeout for frontend requests on App Engine. If you updated the sitemap when you published a new page in your blog, then it might not complete within 60 seconds— using cron we have up to 10 minutes available to build the sitemap.
To use cron, we’ll create another PHP script that will build the sitemap. In the root folder of your application create a file called sitemap_build.php and copy the following contents into it.
<?php /** * Rebuild the sitemap for the site, should be called from a cron job. */ require 'wordpress/wp-load.php'; // Check that the sitemap file exists and create it if it does not. $bucket_name = get_option('appengine_uploads_bucket', ''); $sitemap_file = 'gs://' . $bucket_name . '/sitemap.xml'; if (!file_exists($sitemap_file)) { $ctx = stream_context_create(['gs' => ['content-type' => 'text/xml']]); file_put_contents($sitemap_file, '<?xml version="1.0" encoding="UTF-8"?>', false, $ctx); } do_action('sm_rebuild');
You’ll notice I also added code to create an empty sitemap file before executing the build action if one does not already exist.
In your app.yaml file add a handler that will call this script when the URL /sitemap-build is accessed. We can secure this URL using the login: admin
value for the handler, described here.
- url: /sitemap-build script: sitemap_build.php login: admin
Once again – put this line above the catch-all URL handler in your app.yaml file.
Now, we need to add the cron job to the cron.yaml file. If you followed the previously linked tutorial you should already have a cron.yaml
file for your app, but if you don’t have one just create a new one in the root directory of your project.
Edit the cron.yaml
file and add a scheduled call of your sitemap-building URL. In our case, we’ll rebuild the sitemap every 24 hours. Your cron.yaml
file might look like this when you’ve finished.
cron: - description: wordpress cron tasks url: /wp-cron.php schedule: every 2 hours - description: sitemap build task url: /sitemap-build schedule: every 24 hours
Now you’re ready to push the updates to your application to production [or you can test locally in production first*]. Use appcfg.py to push up the new version of your application.
appcfg.py update .
Once you’ve pushed the new version of the application, we’re ready to build sitemaps.
First, go to the App Engine admin console for your app, and check that the cron job has been scheduled to run. You should see something similar to the screen shot below in the cron jobs tab.
Now, we can build the sitemap by calling the /sitemap-build URL of your application from your browser (you need to be logged in as your app administrator to call this URL), and then download it to check that it worked.
To do this in chrome
- Create the sitemap file by accessing the url http://your-application-name.appspot.com/sitemap-build.
- To view the sitemap in the chrome address bar enter view-source:your-application-name.appspot.com/sitemap.xml
Followup
Now that your sitemap is being generated you can also:
- Go to Google Webmaster Tools and tell it the location of the sitemap for your site.
- Update the robots.txt file for your site to include a link to the sitemap.xml file.