How Search Engines Work - Part II
In the last article, we talked about how search engines work. This article will talk briefly about the elements on your website that may hinder the search engine spiders or “bots” from quickly and easily gathering the information from your site so that it can be properly indexed.
There are certain types of navigation that can keep the search engines from getting to the important information they need. The architecture of hyperlinks is what the search engine spiders actually use to crawl the web. They need these hyperlinks in order to find new pages and to see changes made to existing pages and documents. Without knowing it, you may be setting up your pages with complex links and deep structure to the site with little unique content, and when this happens you effectively are putting up walls that make it difficult for the spiders to move about freely and gather the information to do a good job of indexing what they find.
· An example of issues that you may be creating are complex URLs with 2 or more dynamic parameters, such as: http://www.yoursite.com/index.html?id=4bplDC&mo8%enuff%=Jnk These types of links can be scary for the spiders because they can return errors for the bots.
· Pages with 100 or more unique links that point to other pages on the site and pages that are placed deep in the site. Deep pages are those that are more than 3 levels (or links) away from the home page of the website.
· Other hindrances are: pages that require a Cookie to enable navigation (otherwise known as “Session IDs”) and pages that are split into frames.
· Also stay away from pages that you can get to only from a specific form and/or using a submit button, pages that can only be accessed via a drop down menu link, and pages only accessible from a search box or requiring a login.
· If you have pages or documents that are purposefully blocked using a robots meta tag or something similar then the spiders will not do their job and pages that use a re-direct before the end content is reached (also referred to as cloaking). Of course, cloaking may get you banned, period, so it’s not suggested.
In order to avoid all of these walls and issues that can keep your site from getting the search engine love that it deserves, you need to make sure you are providing direct HTML links that point to each page that you’d like the search engine spiders to index. Keep in mind that its best if the pages are accessible from the home page as this is where most spiders start their crawl. Otherwise, your pages are not likely to be seen if they are not linked to from the front page. A great way to make sure that the spiders see everything is to have a sitemap. Sitemaps are great for both human visitors and the search engines.
Tags: robots meta tag, search engine spiders, Search Engines


