Sitemap generation

Difficulty: Medium

Google erected their Webmaster tools some time ago. Part of this is sitemap.xml, a file containing all the urls of your website to allow for easier indexing by Google and other search engines. It’s in xml, writing it by hand would thus be a serious PITA. Today I whipped up a small library for Kohana to aid in this. In part it uses the get_url() method I posted on earlier.

So, how does it work. We begin with adding a route in /application/config/routes.php

$config['sitemap.xml'] = 'sitemap';

This will map all request to example.com/sitemap.xml to the site map controller. So, next step is creating that controller.

//application/controllers/sitemap.php
class Sitemap_Controller extends Controller{
	public function index(){
		$sitemap=new Sitemap; //create new sitemap
                $sitemap->add_url('http://www.example.com','2008-05-31','weekly',1); //url, last modified, change frequency, priority
		$sitemap->location='http://www.example.com/sitemap.xml'; //not necessary really since this url is assumed
		echo $sitemap->render(); //will output the sitemap and add an xml header
                $sitemap->ping_google();//tell Google about the sitemap
	}
}

This will output a validating sitemap.xml Caching is right now something for the future. Other methods are: save(), save sitemap to a file, get() retrieve the sitemap string.

I talked about the ORM::get_url() method in an earlier post. It comes in quite handy in this class though.

$sitemap=new Sitemap; //create new sitemap
$sitemap->add_model('article');

This code will call add_url(Article_Model $article->get_url()) for each record in the table. If you have a column ‘modified’ in your table and that column returns a timestamp (perhaps using __get) it will also set the lastmod element in the xml file.

Conditions are also possible

$sitemap=new Sitemap; //create new sitemap
$sitemap->add_model('article',array('is_published'=>1);  //where condition like in db builder
$sitemap->get();

You can see that generating a sitemap from a model is very easy using the get_url() method. You can quickly setup the sitemap from your models. Of course, it doesn’t cover for complicated cases. Google and other search engines can access your sitemap now through example.com/sitemap.xml

If you have a lot of records this library might be costly so you should put in some caching.

Update
I added caching to the library. For an example see the repo under controllers


5 Responses to “Sitemap generation”

  1. Alex Sancho Says:

    Just one word, brilliant. Keep the good work.

  2. theShark Says:

    Great :)

  3. mrks Says:

    +1 :)

    btw
    i had add ‘unique location entry’ functionallity

  4. dlib Says:

    I’ll look into it

  5. dlib Says:

    In SVN there is now a check for url existence, if url exists it’s not added but no exception or anything is thrown.

Leave a Comment