How To Create An Automatic Sitemap For Your Rails App On Heroku (RailsShorts)
Here's a short article on how you can configure your rails app to automatically generate sitemaps for your domain. This will automatically help you generate a sitemap.xml.gz file once every week (or more frequent).
First up add these gems to your gemfile.
# Gemfile
gem "fog-aws"
gem "sitemap_generator"
The sitemap_generator gem (https://github.com/kjvarga/sitemap_generator) will help convert your current routes into an XML file. To configure it, type;
gem install sitemap_generator
Then create a sitemap.rb file inside your /config directory (not inside initializes!). In this file you'll have to define the configurations for the generator and the actual paths you want it to include. Since we want the sitemap to generate in production and Heroku doesn't allow persistent file-uploads, you will need an AWS account for this and a bucket you can use to upload the file to amazon S3
# config/sitemap.rb
SitemapGenerator::Sitemap.default_host = "http://www.mydomain.com" # Your Domain Name
SitemapGenerator::Sitemap.public_path = 'tmp/sitemap'
# Where you want your sitemap.xml.gz ["S3_ACCESS_KEY"],
aws_secret_access_key: ENV["S3_SECRET_KEY"],
fog_provider: 'AWS',
fog_directory: ENV["S3_BUCKET_NAME"],
fog_region: ENV["S3_REGION"]
)
# The full path to your bucket
SitemapGenerator::Sitemap.sitemaps_host = "https://#{'ENV["S3_BUCKET_NAME"]'}.s3.amazonaws.com"
# The paths that need to be included into the sitemap.
SitemapGenerator::Sitemap.create do
Article.find_each do |article|
add article_path(article.slug, locale: :en)
add article_pathproject_path(project, locale: :en)
add project_path(project, locale: :nl)
end
Page.find_each do |page|
add page_path(page, locale: :en)
add page_path(page, locale: :nl)
end
add "en/single-page"
add "nl/single-page"
add "nl/starters-website"
add "en/starters-website"
add "nl/website-op-maat"
add "en/website-op-maat"
add "nl/webapplicatie"
add "en/webapplicatie"
add "nl/website-analyse"
add "en/website-analyse"
end
So now when you rake sitemap:refresh
you should get similar output. Or you can also rake sitemap:refresh:no_ping
to not notify search engines whilst you're testing it out.
$ rake sitemap:refresh
In '/home/simon/Desktop/Code/personal/truetech-v4/public/':
+ sitemap.xml.gz 61 links / 974 Bytes
Sitemap stats: 61 links / 1 sitemaps / 0m00s
Now visit your AWS bucket and see if the uploaded file is there!
Allright, that's 75% of the way. Now we have to reconfigure our routes so Google can be notified through webmaster console of the changes and it can also retrieve the file on future crawls.
# public/robots.txt
Sitemap: http://www.mydomain.com/sitemap.xml.gz
# config/routes.rb
get '/sitemap.xml.gz', to: redirect("https://s3-eu-west-1.amazonaws.com/XXXXX-YOUR-BUCKET-XXXXX/sitemap.xml.gz")
Now all that's left to do it to run the rake sitemap:refresh
as an automatic task on heroku. For this I've used the free heroku scheduler addon (https://elements.heroku.com/addons/scheduler). This allows you to run a particular rake task (like sitemap:refresh) on a daily basis.
It allows you to schedule tasks each hour or day. If you find this too frequent, create a separate rake task which queries the current date/day of week/whatever interval you want to use before doing the job. Like;
# lib/sitemap.rake
require "time"
task :generate_sitemap do
if Time.now.tuesday?
Rake::Task["sitemap:refresh"].invoke
end
end
Now you're ready to check everything with the google webmaster tools! And your sitemap is one thing less to worry about :)
Comments