Skip to content
You are here: Home arrow Search engines arrow How to make your website crawlable by search engines
How to make your website crawlable by search engines
Written by Peter Dowse   
Sunday, 02 March 2008
Making sure your website is crawlable is possibly the most important part of your online marketing strategy, because without a crawlable website you won’t get into a search engine’s index.

Without any pages in a search engine’s index, nobody’s going to find your site through search engines. It’s pretty simple. Making sure your website can be crawled by a search engine’s spider means your website has the potential to be found!

Lots of people ask me “Why isn’t Google crawling my website?” and the answer to this question is usually pretty straightforward. There are loads of reasons why a search engine won’t crawl your website.

Here are a few of the main reasons:-

Full Flash website
Full Flash websites are death for search engine marketing. Technology simply hasn’t gotten to the point where search engines are able to recognise words on images. If your site is built fully in Flash I can almost guarantee you your website will not be indexed by search engines and people definitely won’t be able to find it.

Example: www.takethisdance.com
I’ve picked this site quite arbitrarily as an example. As a website it looks spectacular and the visual appeal is definitely something that makes this website breathtaking. But look under the hood and you will find that this site will not do very well in driving traffic through search engines.

If you do a simple site operator in Google on this website you will see there’s only one page in the index. Not a great result if you’re looking at driving traffic through search engines.

If you have a full Flash website there’s only two ways to get around not being crawled:-
1. Scrap the website and start all over.
2. Create a mirrored website of the Flash version (basically build a html version of the Flash website).

Site Architecture
Having the wrong site architecture can mean your website won’t be crawled (the term ‘site architecture’ is just a fancy way of saying ‘how your website has been built’). Typically if you have javascript menu structures or an ajax driven website, you will find search engines have trouble with crawling and indexing these types of sites.

Example: www.trucksuniqueaz.com
From an SEO standpoint this site is a mess. The first thing wrong with this site is the menu structure. It’s written entirely in Javascript. Looks great but doesn’t work from an SEO standpoint as they only have one page in the index.

Not only that, but all their page titles are called “Milonic DHTML/JavaScript Menu Sample Page”. What that has to do with trucks I don’t know! I’m really, really hoping that this is a demo site for this javascript menu system… but it doesn’t really look that way.

Framed websites
Framed websites are also really difficult to get indexed. If you’re not sure what a framed website looks like check out this example.

If you click on the menu items of this page you will see the URL doesn’t change. Typically when you navigate through the pages of most websites, the URL will change (for example if you go to the about us page it’s usually called www.yourdomain.com.au/aboutus.html or something similar). The website URL in this example stays the same no matter what menu item you click.

The way framed websites work is that pages are pulled into a frame. You usually have a banner and the side menus and the content area. The content pages are sometimes left blank (i.e no branding on them) because they’re pulled into the frame (header and side area) – the branding component of the site is in the header and side menu. This can be bad news from a branding example.

Let’s look at this site again because some of their internal pages have been indexed but not in the right way. If you were to navigate to this site through a search engine you wouldn’t know what to do. For example, this site ranks number one in Google for the term “Austsafe Investment Choice”; now the top result is this URL:-
www.austsafe.com.au/investmentchoice.html

If we go to that page there’s no menu structure, no banner, nothing… just content. This is because the content page is left blank and then pulled into the frame (header and side menu area that actually contains the branding). This page doesn’t give a visitor any options other than clicking the back button and doing another search or closing down the window and going somewhere else.

Too many variables in URLs
When talking about variables in URLs, these are symbols like &, ? and > or <. These symbols can make it difficult for search engine spiders to crawl your website and the more you have in your URLs the harder it will be to crawl. Google are getting pretty good with indexing these types of URLs but Yahoo and MSN are still playing catch up.

This URL from the Joomla extensions area is a good example of a URL with multiple variables (a few dashes and a #).
http://extensions.joomla.org/component/option,com_mtree/task,viewlink/link_id,394/Itemid,35/#rev-12898

Google can crawl this , however Yahoo has trouble.

Robots.txt file is wrong
What is a robots.txt file? A robots.txt file is a file that you can put onto your web server to tell search engines what pages not to crawl. It’s basically a set of instructions to tell the search engines where they shouldn’t go. The easiest way to see if you have a robots.txt file on your site is to go to www.yourdomain.com.au/robots.txt

Here’s an example: http://www.seohub.com.au/robots.txt

You have to be very careful what you disallow search engines to see as if you disallow the whole website, they won’t crawl it!

OK, so that’s what not to do… what’s the right thing to do?
Well the easiest way to see if your website is crawlable is to either do a site operator in one of the major search engines and see how many pages you have in the index (just type in site:www.yourdomain.com.au into a search engine), or you can conduct a spider simulation.

Conducting a spider simulation
Conducting a spider simulation on your website can be really handy. What you’re doing is looking at your website the way a search engines crawler does. There are loads of tools for conducting spider simulations but the one I use is Webconf’s spider simulator.

All you need to do is add your domain name into the search box and it will show you all the text it can crawl and all of the links the simulator can follow. If this simulator comes up blank then you’re in trouble.

Create deep links throughout your website
A deep link is a link to an internal page of your website. For example if I were to link the term how to set up an xml sitemap this would be classed as a deep link to another page on SEOhub. Deep linking is great for SEO value and it also gives the search engine crawlers many paths through your website. When deep linking your website, remember to keep the links relevant and try to add keywords within the words you link.

Create a html sitemap
Creating a .html sitemap is another great way to let the engines know what pages are on your site. When I talk about a html sitemap, this is a page on your site that has links to all the other pages on your site. Like the sitemap page on this website .

Create an xml sitemap
One of the best ways to ensure your site is fully crawlable is to create an xml sitemap. I won’t go into that here as you can ready about xml sitemaps in one of my previous articles .

Hopefully this gives you a pretty good outline of how to make sure your site crawlable. If you have any questions about the crawlability of your site, please send me an email.

 
< Prev

Register FREE with SEOhub

Login to SEOhub to ask us a question... we'll answer you by posting to our site.
 
Advertisement

Advertisement

Advertisement

"SEOhub is brilliantly simple to navigate, and contains a wealth of information and ideas that couldn’t be more spot on. Complex SEO jargon is translated into a language anyone can understand. What a find!"
Eileen Naseby
www.toowrite.com.au

 

Login to SEOhub


Online SEO training

"Thanks to SEOhub's online SEO training I was able to identify problem areas on my website and fix these. Now my website's doing better than ever!"
 

Learn SEO - do it yourself

Do-it-yourself SEO - learn SEO by checking out the articles and video tutorials on SEOhub - our range of SEO training material will help you on your way. 

 

Search engine optimisation training - learn SEO

Use SEOhub's search engine optimisation articles and tutorial videos to learn SEO and help improve your search engine rankings.