How Googlebot first finds your site
There are essentially four ways in which Googlebot finds your blog.
- The first and most obvious way is for you to submit your URL to Google for crawling, via the “Add URL” form at www.google.com/addurl.html.
- The second way is when Google finds a link to your site from another site that it has already indexed and subsequently sends its spider to fol-low the link.
- The third way is when you sign up for Google Webmaster Tools, verify your site, and submit a sitemap.
- The fourth (and final) way is when you redirect an already indexed web-page to the new page (for example using a 301 redirect, about which there is more later).
In the past you could use search engine submission software, but Google now prevents this – and prevents spammers bombarding it with new sites – by using a CAPTCHA, a challenge-response test to deter-mine whether the user is human, on its Add URL page. CAPTCHA stands for Completely Automated Public Turing test to tell Computers and Humans Apart, and typically takes the form of a distorted image of letters and/or numbers that you have to type in as part of the submission.
How quickly you can expect to be crawled
There are no firm guarantees as to how quickly new blogs/sites – or pages – will be crawled by Google and then appear in the search index. However, following one of the four actions above, you would normally expect to be crawled within a month and then see your pages appear in the index two to three weeks after wards. In my experience, submission via Google Webmaster Tools is the most effective way to manage your crawl and to be crawled quickly, so I typically do this for all my clients.
Google bot on your site
Once Googlebot is on your site, it crawls each page in turn. When it finds an internal link, it will remember it and crawl it, either later that visit or on a subsequent trip to your site. Eventually, Google will crawl your whole site.
Imagine that your blog/site is a tree, with the base of the trunk being your home page, your directories the branches, and your pages the leaves on the end of the branches. Google will crawl up the tree like nutrients from the roots, gifting each part of the tree with its all-important PageRank. If your tree is well structured and has good sym-metry, the crawl will be even and each branch and leaf will enjoy a proportionate benefit.

0 comments:
Post a Comment