Where do search engines start crawling?

0 votes
asked Sep 3, 2008 by agnel-kurian

What do search engine bots use as a starting point? Is it DNS look-up or do they start with some fixed list of well-know sites? Any guesses or suggestions?

3 Answers

0 votes
answered Sep 3, 2008 by conroyp

You can submit your site to search engines using their site submission forms - this will get you into their system. When you actually get crawled after that is impossible to say - from experience it's usually about a week or so for an initial crawl (homepage, couple of other pages 1-link deep from there). You can increase how many of your pages get crawled and indexed using clear semantic link structure and submitting a sitemap - these allow you to list all of your pages, and weight them relative to one another, which helps the search engines understand how important you view each part of site relative to the others.

If your site is linked from other crawled websites, then your site will also be crawled, starting with the page linked, and eventually spreading to the rest of your site. This can take a long time, and depends on the crawl frequency of the linking sites, so the url submission is the quickest way to let google know about you!

One tool I can't recommend highly enough is the Google Webmaster Tool. It allows you to see how often you've been crawled, any errors the googlebot has stumbled across (broken links, etc) and has a host of other useful tools in there.

0 votes
answered Sep 3, 2008 by mweerden

In principle they start with nothing. Only when somebody explicitly tells them to include their website they can start crawling this site and use the links on that site to search more.

However, in practice the creator(s) of a search engine will put in some arbitrary sites they can think of. For example, their own blogs or the sites they have in their bookmarks.

In theory one could also just pick some random adresses and see if there is a website there. I doubt anyone does this though; the above method will work just fine and does not require extra coding just to bootstrap the search engine.

0 votes
answered Sep 3, 2008 by giovanni-galbo

Your question can be interpreted in two ways:

Are you asking where search engines start their crawl from in general, or where they start to crawl a particular site?

I don't know how the big players work; but if you were to make your own search engine you'd probably seed it with popular portal sites. DMOZ.org seems to be a popular starting point. Since the big players have so much more data than we do they probably start their crawls from a variety of places.

If you're asking where a SE starts to crawl your particular site, it probably has a lot to do with which of your pages are the most popular. I imagine that if you have one super popular page that lots of other sites link to, then that would be the page that SEs starts will enter from because there are so many more entry points from other sites.

Note that I am not in SEO or anything; I just studied bot and SE traffic for a while for a project I was working on.

Welcome to Q&A, where you can ask questions and receive answers from other members of the community.
Website Online Counter