What Are Search Engine Crawlers?

[adrotate banner=”16″]

The definition of Web Crawler directly from Wiki

Web crawler is a computer program that browses the World Wide Web in a methodical, automated manner. Other terms for Web crawlers are antsautomatic indexersbots, and worms[1] or Web spiderWeb robot.

This process is called Web crawling or spidering. Many sites, in particular search engines, use spidering as a means of providing up-to-date data. Web crawlers are mainly used to create a copy of all the visited pages for later processing by a search engine that will index the downloaded pages to provide fast searches. Crawlers can also be used for automating maintenance tasks on a Web site, such as checking links or validating HTML code. Also, crawlers can be used to gather specific types of information from Web pages, such as harvesting e-mail addresses (usually for spam).

A Web crawler is one type of bot, or software agent. In general, it starts with a list of URLs to visit, called the seeds. As the crawler visits these URLs, it identifies all the hyperlinks in the page and adds them to the list of URLs to visit, called the crawl frontier. URLs from the frontier are recursively visited according to a set of policies.

So the Web Crawler is the tool that visits your site late at night and traverses as much of your site as possible making that information available to search engines such as Google or Bing so they can easily find your content, and rank it properly.  So the Web Crawler is your friend and you want to make it as easy for the crawler as possible when visiting your site.  If you don’t help the crawler, it will use generic rules that may hurt your search ranking, or might make certain parts of your site un-searchable. It may waste time indexing system files such as CSS files or other files that are meaningless to the purpose of your site.

Make the Robots Happy and Productive – Use “Robots.Txt” Files

If you believe in Christmas and Santa, then you know what I mean when I say you wouldn’t go to bed without leaving some cookies and milk for Santa to put him in a good mood while visiting your house.  A simple and standard file called “robots.txt” is what the Robots or Crawlers look for the moment they get to your site.  Imagine their DisplacedGuy Blog PowerBuilder Silverlight Wavemaker disappointment that you didn’t even think of them by leaving a simple robots.txt file.

How the Robots Work

Seriously speaking, the Crawler or Robot is looking for objects on your site, and objects are files, folders, web links.  Upon reaching the root of your site, the Robot looks for the Robots.txt file and uses it to understand your site and maximize the benefit of the crawl.  Your robots.txt file will help the robot to assign content to indexes, and understand arranged web page order to structure indexes for faster finding by internet searcher.  In this case crawler will filter which are web page, file, folder and which can be indexed or not. Most of web page contain links to other pages and normally spider will start from top left to right down similar to reading a book.

Making your robots.txt File

Robots.txt is text file not html this will be placed on the root of your web site.  There are books written on the subject of web crawlers and usage of robots.txt files but here is a simple start:

Location & naming

1. Name it robots.txt *not* robot.txt or Robot.Txt, or spider.txt

2. Add rules to the text file, save, and place a copy at the root of your web site.  Many sites available on rule formats.

Example 1 – Disallow All robots for specific folders and files

Make a list of everything on your site you DON’T want robots/spiders to visit and put in file like this.  Note: You could replace the wildcard for user agent and put specific robots that you want to ban.

# robots.txt for http://www.sample.com/

User-agent: *

Disallow: /chat/          # Online chat files
Disallow: /testsite/      # This is a test area
Disallow: /login.html     # This is a an admin file

Good Luck

I hope this helps give you a basic understanding of the robots.txt and how Web Crawlers work.  This information here scratches the surface of what you can do with the robots.txt file. There are tons of sites focusing on it entirely so I won’t bother reinventing the wheel– just wanted to get you started.


The DisplacedGuy  (a.k.a. Rich Bianco)

P.S.  My daughter Heather is taking ownership of another blog called Otown411 as she wants to help the family situation with me being unemployed. She sees me working day and night and was willing to try to get Otown411 up and running. The site is targeted towards people looking to visit or vacation in the Orlando area and we would offer insider tips about maximizing the vacation since we live here and know all the ins-and-outs.  It would be great if you could stop over and give her some motivation to stick with it.  She is like me and will be checking visitor stats constantly which is the motivating part.  I’m banking on this time I’m spending as being an investment…  fingers crossed.  IF any of the sponsor sites on this site are appropriate please consider visiting them.

34 Responses

  1. Hey dear can i publish some paragraph of your post on my little blog of university.I have to publish a good articles out there and i do think your post Fits well into it.I will be glad to provide you an source link as well.I have two blogs one my personal and the other which is my college blog.I will write some part in the university blog.Hope you do not mind.Greetings

  2. greetings there, i just saw your site via bing, and i would like to comment that you express exceptionally good via your site. i am very struck by the mode that you write, and the subject is quality. anyway, i would also love to acknowledge whether you would like to exchange links with my site? i will be certainly more than willing to reciprocate and enter your link off in the blogroll. anticipating for your answer, i would like to convey my appreciation and have a great day!

  3. Another very strong and powerful post. I’ve been reading through some of your previous posts and finally decided to drop a comment on this one. I signed up for your newsletter, so please keep up the informative posts!

    Hope you subscribe to my blog as well and leave a few comments here and there! Also would appreciate it if you check out some of my products I created and maybe you could promote them on your blog to make us both some money! Check out my site at : Make Money Online with Dino Vedo.

    All the best,
    Dino Vedo

    PS: Like my Facebook fan page and follow me on Twitter and I’ll do the same for you! 😉

  4. Wow!, this was a real quality post. In theory I’d like to write like this too – taking time and real effort to make a good article… but what can I say… I keep putting it off and never seem to get something done

  5. I hope you have a good day! Very good article, well written and very thought out. I am looking forward to reading more of your posts in the future.

  6. I just assumed i’d distribute and let you realize your weblogs is valuable for uncovered the practical strategy.I genuinely love your weblog.Systematically, the post is in actuality the best on this worth while topic. I concur together with your ideas and will desperately search forward for your forthcoming tweets. Simply just saying thanks will not just be enough, for that brilliant lucidity inside your methods. I will quickly capture your rss feed to remain updated of any updates.Real do the trick and substantially achievements in your give good results and small business tries.Anyways maintain up the very good efforts.Appreciate it.

    • Yes, short sales and foreclosures are killing our housing market. Home prices have fallen over 100% in the last five years and have not showed any sign of leveling out. If I were a buyer I would NOT buy yet, I have a gut feeling that there will be a hard/fast crash in prices before the recovery begins, similar to how a stock when overbought needs to shake out all the weak hands.
      Thanks for the comment,

  7. Just got a chance to leave a comment so here it is! Excellent post and very interesting stuff! Hope all the best for your blog and your making money online ventures…

    Just letting you know that I’ve signed up for your blog newsletter and looking forword to your future posts. It would be great if you’d do the same for my blog… I’ve also created a few products that I promote on my blog and would love if you’d consider promoting them on yours for some quick affiliate cash!

    All the best,
    Dino Vedo

    PS: Like my Facebook fan page and follow me on Twitter and I’ll do the same for you! 😉

  8. Thank you… I’m going to add this to my favorites. If I may ask, what got you started into blogging? To be honest I’ve just been catching on to this hobbie and it’s really begun to inspire me to begin a blog of my own. I’ve tried but nothing that material has occured as of yet. You seem established, hints would be appreciated…

  9. Ooohh, If you have a website in English language with 500 unique visitors per day, I can make you earn $200-1000 everyday and the request is after receiving the payment, we share the revenue 50 to 50. This is an invitation sent to you via a group-sending software, which helped me send more than 50,000 invitations to blog writers using wordpress, although only 5 of them established the cooperative relationship with us, they now get $2000-10000 every month. If you are interested in this invitation, please contact us. You will get an auto email reply with an url link liking to detailed information about this project. 😀

Leave a Reply

Your email address will not be published. Required fields are marked *