Sitemap generation
- Posted by dlib on July 15th, 2008 filed in Code examples, Medium, ORM
Google erected their Webmaster tools some time ago. Part of this is sitemap.xml, a file containing all the urls of your website to allow for easier indexing by Google and other search engines. It’s in xml, writing it by hand would thus be a serious PITA. Today I whipped up a small library for Kohana to aid in this. In part it uses the get_url() method I posted on earlier.
So, how does it work. We begin with adding a route in /application/config/routes.php
$config['sitemap.xml'] = 'sitemap';
This will map all request to example.com/sitemap.xml to the site map controller. So, next step is creating that controller.
//application/controllers/sitemap.php class Sitemap_Controller extends Controller{ public function index(){ $sitemap=new Sitemap; //create new sitemap $sitemap->add_url('http://www.example.com','2008-05-31','weekly',1); //url, last modified, change frequency, priority $sitemap->location='http://www.example.com/sitemap.xml'; //not necessary really since this url is assumed echo $sitemap->render(); //will output the sitemap and add an xml header $sitemap->ping_google();//tell Google about the sitemap } }
This will output a validating sitemap.xml Caching is right now something for the future. Other methods are: save(), save sitemap to a file, get() retrieve the sitemap string.
I talked about the ORM::get_url() method in an earlier post. It comes in quite handy in this class though.
$sitemap=new Sitemap; //create new sitemap $sitemap->add_model('article');
This code will call add_url(Article_Model $article->get_url()) for each record in the table. If you have a column ‘modified’ in your table and that column returns a timestamp (perhaps using __get) it will also set the lastmod element in the xml file.
Conditions are also possible
$sitemap=new Sitemap; //create new sitemap $sitemap->add_model('article',array('is_published'=>1); //where condition like in db builder $sitemap->get();
You can see that generating a sitemap from a model is very easy using the get_url() method. You can quickly setup the sitemap from your models. Of course, it doesn’t cover for complicated cases. Google and other search engines can access your sitemap now through example.com/sitemap.xml
If you have a lot of records this library might be costly so you should put in some caching.
Update
I added caching to the library. For an example see the repo under controllers
July 15th, 2008 at 5:24 am
Just one word, brilliant. Keep the good work.
July 15th, 2008 at 1:13 pm
Great
July 17th, 2008 at 7:21 am
+1
btw
i had add ‘unique location entry’ functionallity
July 17th, 2008 at 8:13 am
I’ll look into it
July 20th, 2008 at 11:20 am
In SVN there is now a check for url existence, if url exists it’s not added but no exception or anything is thrown.