Thursday 26 November 2009

Site Crawling for SEO

Spent some time today working on Spiralytics - our web crawling software. It crawls web sites for SEO purposes and builds up a report of all the pages it finds.

We had a problem a few weeks ago with a crawl on one of our sites - it could only find a small percentage of the web site. Normally this is caused by Javascript or Flash embedded links, but this time the pages were linked with normal anchor links.  After some investigation I found the issue was caused by the web site returning HTTP error code 403.

HTTP Error 403


The 403 Forbidden error code is returned normally by the server when clients are not allowed to view a page. For example, if you attempt to view a directory like /pages/ but there is no index page.  Their are other reasons, including the server incorrectly returning 403 instead of 401 Unauthorised. It was none of these reasons because the page is visible to web browsers.  This only leaves something to do with the server not liking our crawler!

I first tried changing the user-agent to various Mozilla and googlebot, but still got the same response.  I then tried to slow down the crawler in case the server restricted many requests from the same IP within a short time period. But again no luck?

So after all that - no luck.  It'll have to wait for another day.

iPhone

Started working on a new iPhone app for a client.  We've already decided on the basic functionality, so I was just finalising some of the details and producing mockups based on the initial designs from our designer.

Wednesday 25 November 2009

My new blog

Welcome to my new blog

I thought it was about time I started a blog! A little bit of info about myself - I am CTO of Web Comms, an iPhone development company. I wanted to create a blog to let everyone know the things we are working on at Web Comms and put out some details on the technical solutions we've found.  I'll also be giving info on the latest news and goings on in the iPhone community that I am involved in or hear about.

The next blog will be along soon when I describe some of the things we've been doing today!