ComputersLinuxProgrammingWeb Design

Improve Your Search Engine Ranking with Google Sitemaps and PHPBB

If you want lots of visitors to your site then you need to get a good ranking in the search engines, and the one search engine that outranks the rest is google. Looking at the awstats from watkissonline.co.uk and the awstats on firstaidquiz.com between 10% and 20% of all visitors come from search engines, and of those up to 10 times as many referrals come from Google as the next highest. So getting a good ranking in google will have a significant increase in the number of new visitors.

Google is quite good at finding web sites on its own; it has already found my new Penguin Tutor Linux web site, despite having never submitted it to any search engines, or having even completed the site yet. Google does not however always find every page of the website; especially if there are multiple levels of pages.

Google has therefore implemented server side sitemaps, which allow you to create a list of pages on your server that Google can then use when crawling your site. There are a number of advantages to using these: you can view the status of your pages and how well google has been able to crawl them; a full list of pages is provided to google so that they can crawl your entire site and you can indicate how frequently the content is updated to influence how often each page is crawled.

I have already implemented sitemaps on watkissonline.co.uk, and added some code to automatically include wordpress pages. See: Using Google Sitemaps with WordPress

With my new website Penguin Tutor. com I use PHP-BB to provide a forum. So I’ve created some php code that helps in the creation of a Google sitemap.

The first thing to consider is that this is not a standalone program, but after running it, it needs to be converted to xml format using the Python script available from: Google Sitemap Pages. The Google program converts a text file containing a list of urls into an XML file.

So this script creates the text list which needs to be formatted as given in the following example (taken from a sample file provided by google).

##############################
# To add a list of URLs, make a space-delimited text file. The first
# column contains the URL; then you can specify various optional
# attributes in the form key=value:
#
#  lastmod    = modification time in ISO8601 (YYYY-MM-DDThh:mm:ss+00:00)
#  changefreq = 'always' | 'hourly' | 'daily' | 'weekly' | 'monthly' |
#               'yearly' | 'never'
#  priority   = priority of the page relative to other pages on the same site;
#                a number between 0.0 and 1.0, where 0.0 is the lowest priority
#                and 1.0 is the highest priority
#
# Note that all URLs must be part of the site, and therefore must begin with
# the base_url (e.g., 'http://www.example.com/') as specified in config.xml.
#e.g.
# http://www.example.com/foo/bar
# http://www.example.com/foo/xxx.pdf lastmod=2003-12-31T14:05:06+00:00
# http://www.example.com/foo/yyy?x=12&y=23 changefreq=weekly priority=0.3
##############################

So I first created a text file using the above format, and included all my static pages. Then add the forum pages using the following code.


<?php
// Start of the url
$hostname = "http://www.penguintutor.com";
//The topic view page of phpbb
$viewtopic = "/phpbb/viewtopic.php?t=";
// Path to the phpbb config file
$forumcfg = "config.php";

// include the DB details from the config file
include ($forumcfg);
// Connect 
// (if at any point we get an error then we just skip that task)
if ($db = @mysql_connect($dbhost, $dbuser, $dbpasswd))
{
	if (@mysql_select_db($dbname) ) 
	{
		if ($result = @mysql_query("SELECT topic_id FROM phpbb_topics"))
		{
			while ( $row = mysql_fetch_array($result) )
			{
				echo ($hostname.$viewtopic.$row[0]." changefreq=weekly priority=0.5\n");
			}
		}
	}
}
?>

This code is fairly basic. It just reads in the topic numbers, and creates a url entry pointing to each topic. It does not look at individual comments.
It sets the priority to 0.5 (max 1.0), and indicates that the pages are updated on a weekly basis.

It would be better to actually identify what date the page was actually updated rather than just setting it to weekly. A slightly easier solution would be to check the date that the topic was started and any that are fairly recent (e.g. a month or less), set to updated daily, and any that are older than that set to weekly, monthly, or never.
I may add details if I get the time to implement this, or post your own solutions as comments either to the Penguin Tutor Linux Forum, or to this blog entry on WatkissOnline – this blog entry.