How To Create An Automatic Sitemap For Your Rails App On Heroku (RailsShorts)

How To Create An Automatic Sitemap For Your Rails App On Heroku (RailsShorts) - Webdesign Antwerpen

Here's a short article on how you can configure your rails app to automatically generate sitemaps for your domain. This will automatically help you generate a sitemap.xml.gz file once every week (or more frequent).

First up add these gems to your gemfile.

# Gemfile
gem "fog-aws"
gem "sitemap_generator"

The sitemap_generator gem (https://github.com/kjvarga/sitemap_generator) will help convert your current routes into an XML file. To configure it, type;

gem install sitemap_generator

Then create a sitemap.rb file inside your /config directory (not inside initializes!). In this file you'll have to define the configurations for the generator and the actual paths you want it to include. Since we want the sitemap to generate in production and Heroku doesn't allow persistent file-uploads, you will need an AWS account for this and a bucket you can use to upload the file to (https://devcenter.heroku.com/articles/s3

# config/sitemap.rb
SitemapGenerator::Sitemap.default_host = "http://www.mydomain.com" # Your Domain Name
SitemapGenerator::Sitemap.public_path = 'tmp/sitemap'
# Where you want your sitemap.xml.gz file to be uploaded.
SitemapGenerator::Sitemap.adapter = SitemapGenerator::S3Adapter.new( 
aws_access_key_id: ENV["S3_ACCESS_KEY"],
aws_secret_access_key: ENV["S3_SECRET_KEY"],
fog_provider: 'AWS',
fog_directory: ENV["S3_BUCKET_NAME"],
fog_region: ENV["S3_REGION"]
)

# The full path to your bucket
SitemapGenerator::Sitemap.sitemaps_host = "https://#{'ENV["S3_BUCKET_NAME"]'}.s3.amazonaws.com"
# The paths that need to be included into the sitemap. SitemapGenerator::Sitemap.create do Article.find_each do |article| add article_path(article.slug, locale: :en) add article_path(article, locale: :nl) if article.slug_nl != "" end Project.find_each do |project| add project_path(project, locale: :en) add project_path(project, locale: :nl) end Page.find_each do |page| add page_path(page, locale: :en) add page_path(page, locale: :nl) end add "en/single-page" add "nl/single-page" add "nl/starters-website" add "en/starters-website" add "nl/website-op-maat" add "en/website-op-maat" add "nl/webapplicatie" add "en/webapplicatie" add "nl/website-analyse" add "en/website-analyse" end

So now when you rake sitemap:refresh you should get similar output. Or you can also rake sitemap:refresh:no_ping to not notify search engines whilst you're testing it out.

$ rake sitemap:refresh
In '/home/simon/Desktop/Code/personal/truetech-v4/public/':
+ sitemap.xml.gz                                          61 links /  974 Bytes
Sitemap stats: 61 links / 1 sitemaps / 0m00s

Now visit your AWS bucket and see if the uploaded file is there! 

Allright, that's 75% of the way. Now we have to reconfigure our routes so Google can be notified through webmaster console of the changes and it can also retrieve the file on future crawls.

# public/robots.txt
Sitemap: http://www.mydomain.com/sitemap.xml.gz
# config/routes.rb
get '/sitemap.xml.gz', to: redirect("https://s3-eu-west-1.amazonaws.com/XXXXX-YOUR-BUCKET-XXXXX/sitemap.xml.gz")

Now all that's left to do it to run the rake sitemap:refresh as an automatic task on heroku. For this I've used the free heroku scheduler addon (https://elements.heroku.com/addons/scheduler). This allows you to run a particular rake task (like sitemap:refresh) on a daily basis.

It allows you to schedule tasks each hour or day. If you find this too frequent, create a separate rake task which queries the current date/day of week/whatever interval you want to use before doing the job. Like;

# lib/sitemap.rake
require "time"

task :generate_sitemap do
  if Time.now.tuesday?
     Rake::Task["sitemap:refresh"].invoke
   end
end

Now you're ready to check everything with the google webmaster tools! And your sitemap is one thing less to worry about :) 

Comments