Search Engine Crawlers – Treat & Know them like a Friend

on April 3, 2010

What Are Search Engine Crawlers?

[adrotate banner=”16″]

The definition of Web Crawler directly from Wiki

A Web crawler is a computer program that browses the World Wide Web in a methodical, automated manner. Other terms for Web crawlers are ants, automatic indexers, bots, and worms^[1] or Web spider, Web robot.

This process is called Web crawling or spidering. Many sites, in particular search engines, use spidering as a means of providing up-to-date data. Web crawlers are mainly used to create a copy of all the visited pages for later processing by a search engine that will index the downloaded pages to provide fast searches. Crawlers can also be used for automating maintenance tasks on a Web site, such as checking links or validating HTML code. Also, crawlers can be used to gather specific types of information from Web pages, such as harvesting e-mail addresses (usually for spam).

A Web crawler is one type of bot, or software agent. In general, it starts with a list of URLs to visit, called the seeds. As the crawler visits these URLs, it identifies all the hyperlinks in the page and adds them to the list of URLs to visit, called the crawl frontier. URLs from the frontier are recursively visited according to a set of policies.

So the Web Crawler is the tool that visits your site late at night and traverses as much of your site as possible making that information available to search engines such as Google or Bing so they can easily find your content, and rank it properly. So the Web Crawler is your friend and you want to make it as easy for the crawler as possible when visiting your site. If you don’t help the crawler, it will use generic rules that may hurt your search ranking, or might make certain parts of your site un-searchable. It may waste time indexing system files such as CSS files or other files that are meaningless to the purpose of your site.

Make the Robots Happy and Productive – Use “Robots.Txt” Files

If you believe in Christmas and Santa, then you know what I mean when I say you wouldn’t go to bed without leaving some cookies and milk for Santa to put him in a good mood while visiting your house. A simple and standard file called “robots.txt” is what the Robots or Crawlers look for the moment they get to your site. Imagine their disappointment that you didn’t even think of them by leaving a simple robots.txt file.

How the Robots Work

Seriously speaking, the Crawler or Robot is looking for objects on your site, and objects are files, folders, web links. Upon reaching the root of your site, the Robot looks for the Robots.txt file and uses it to understand your site and maximize the benefit of the crawl. Your robots.txt file will help the robot to assign content to indexes, and understand arranged web page order to structure indexes for faster finding by internet searcher. In this case crawler will filter which are web page, file, folder and which can be indexed or not. Most of web page contain links to other pages and normally spider will start from top left to right down similar to reading a book.

Making your robots.txt File

Robots.txt is text file not html this will be placed on the root of your web site. There are books written on the subject of web crawlers and usage of robots.txt files but here is a simple start:

Location & naming

1. Name it robots.txt *not* robot.txt or Robot.Txt, or spider.txt

2. Add rules to the text file, save, and place a copy at the root of your web site. Many sites available on rule formats.

Example 1 – Disallow All robots for specific folders and files

Make a list of everything on your site you DON’T want robots/spiders to visit and put in file like this. Note: You could replace the wildcard for user agent and put specific robots that you want to ban.

# robots.txt for http://www.sample.com/

User-agent: *

Disallow: /chat/          # Online chat files
Disallow: /testsite/      # This is a test area
Disallow: /login.html     # This is a an admin file

Good Luck

I hope this helps give you a basic understanding of the robots.txt and how Web Crawlers work. This information here scratches the surface of what you can do with the robots.txt file. There are tons of sites focusing on it entirely so I won’t bother reinventing the wheel– just wanted to get you started.

Sincerely,

The DisplacedGuy (a.k.a. Rich Bianco)

P.S. My daughter Heather is taking ownership of another blog called Otown411 as she wants to help the family situation with me being unemployed. She sees me working day and night and was willing to try to get Otown411 up and running. The site is targeted towards people looking to visit or vacation in the Orlando area and we would offer insider tips about maximizing the vacation since we live here and know all the ins-and-outs. It would be great if you could stop over and give her some motivation to stick with it. She is like me and will be checking visitor stats constantly which is the motivating part. I’m banking on this time I’m spending as being an investment… fingers crossed. IF any of the sponsor sites on this site are appropriate please consider visiting them.

Categories:

Adsense Adwords Analytics Blogging for Cash Development Internet Marketing Making Money Promote Website Website Development WordPress

Tags:

Internet Income Make Money Ranking SEO Web Design

34 Responses

Tweets that mention The Displaced Guy » Blog Archive » Search Engine Crawlers – Treat & Know them like a Friend -- Topsy.com says:

April 4, 2010 at 6:59 pm

[…] This post was mentioned on Twitter by Rich Bianco. Rich Bianco said: RT @DisplacedGuy Search Engine Crawlers – Treat & Know them like a Friend http://bit.ly/cizOoc […]

Reply
parionssport says:

July 13, 2010 at 3:11 am

Salut J’arrive sans être un passionné sur ce message et je dois dire qu’il nous ouvre les yeux. Un grand merci pour ce blog. Bon courage !

Reply
refurbished apple computers says:

July 14, 2010 at 1:15 pm

Thanks for the Blog, thanks for helping me with this fine Article. I think it is really a great topic to write about on my blog. Also here is some good information if needed: refurbished apple computers

Reply
Shayari says:

July 16, 2010 at 5:26 am

Hey dear can i publish some paragraph of your post on my little blog of university.I have to publish a good articles out there and i do think your post Fits well into it.I will be glad to provide you an source link as well.I have two blogs one my personal and the other which is my college blog.I will write some part in the university blog.Hope you do not mind.Greetings

Reply
Kourtney Wankel says:

July 17, 2010 at 2:45 pm

Excellent contenu. Merci de poster.

Reply
Billy Anstead says:

July 18, 2010 at 2:32 am

Very neat blog post.Really looking forward to read more. Fantastic.

Reply
Gene Stauder says:

July 18, 2010 at 6:06 am

Book marked your webblog. Appreciation for discussing. Absolutely really worth the time from our tests.

Reply
Grant Soose says:

July 18, 2010 at 10:05 pm

Thanks for the interesting content!!!

Reply
Panda Internet Security Coupons says:

July 20, 2010 at 1:09 pm

greetings there, i just saw your site via bing, and i would like to comment that you express exceptionally good via your site. i am very struck by the mode that you write, and the subject is quality. anyway, i would also love to acknowledge whether you would like to exchange links with my site? i will be certainly more than willing to reciprocate and enter your link off in the blogroll. anticipating for your answer, i would like to convey my appreciation and have a great day!

Reply
Dino Vedo says:

August 1, 2010 at 10:35 pm

Another very strong and powerful post. I’ve been reading through some of your previous posts and finally decided to drop a comment on this one. I signed up for your newsletter, so please keep up the informative posts!

Hope you subscribe to my blog as well and leave a few comments here and there! Also would appreciate it if you check out some of my products I created and maybe you could promote them on your blog to make us both some money! Check out my site at : Make Money Online with Dino Vedo.

All the best,
Dino Vedo

PS: Like my Facebook fan page and follow me on Twitter and I’ll do the same for you! 😉

Reply
Legalsounds says:

August 9, 2010 at 3:51 pm

Wow!, this was a real quality post. In theory I’d like to write like this too – taking time and real effort to make a good article… but what can I say… I keep putting it off and never seem to get something done

Reply
Mario Rizzi says:

August 10, 2010 at 4:38 pm

nice article, i just bookmarked it for future reference. i’d like to check on future articles. how can i set the rss reader again? thanks!

Reply
Nuby says:

August 12, 2010 at 10:37 pm

I hope you have a good day! Very good article, well written and very thought out. I am looking forward to reading more of your posts in the future.

Reply
Ian Geldmacher says:

August 13, 2010 at 1:59 am

I found your entry interesting thus I’ve added a Trackback to it on my weblog :)…

Reply
Erin Grimaldi says:

August 13, 2010 at 3:42 pm

This site is so great that i will honor it with my comment 🙂

Reply
bluehost review says:

August 13, 2010 at 10:04 pm

Great, Thank you.

Reply
Electronic Cigarettes says:

August 15, 2010 at 11:23 pm

Thanks for the informative article. Will check out more of your blog posts!

Reply
parions web says:

August 19, 2010 at 9:02 am

Hello Je découvre grace à Google sur ce topic et je reconnais qu’il fait réfléchir. Franchement merci pour ce message. Bonne continuation !

Reply
tax attorney says:

September 18, 2010 at 5:25 am

Thanks man. It is cool reading.

Reply
kostenlos weltweit bargeld abheben says:

September 18, 2010 at 4:16 pm

I like this website and it has shown me some sort of inspiration to have success for some reason, so thanks. Moreover I´m definitely thinking about mentioning these facts in my own blog!

Reply
Lyme disease says:

September 18, 2010 at 6:33 pm

I just assumed i’d distribute and let you realize your weblogs is valuable for uncovered the practical strategy.I genuinely love your weblog.Systematically, the post is in actuality the best on this worth while topic. I concur together with your ideas and will desperately search forward for your forthcoming tweets. Simply just saying thanks will not just be enough, for that brilliant lucidity inside your methods. I will quickly capture your rss feed to remain updated of any updates.Real do the trick and substantially achievements in your give good results and small business tries.Anyways maintain up the very good efforts.Appreciate it.

Reply
Truden says:

September 23, 2010 at 11:53 pm

Super post, tienen que marcarlo en Digg

Truden

Reply
concord discount broker says:

September 24, 2010 at 10:42 am

Thanks for posting about this, I would love to read more….

Reply
joshua Thomas says:

October 9, 2010 at 2:12 am

Whats up ! Love your blog thanks for sharing it with us. Support local business.

Reply
Vinger Jackson says:

October 10, 2010 at 12:05 am

Are you guys seeing short sales just dominate your market?

Reply
- DisplacedGuy says:
  
  October 29, 2010 at 11:08 pm
  
  Yes, short sales and foreclosures are killing our housing market. Home prices have fallen over 100% in the last five years and have not showed any sign of leveling out. If I were a buyer I would NOT buy yet, I have a gut feeling that there will be a hard/fast crash in prices before the recovery begins, similar to how a stock when overbought needs to shake out all the weak hands.
  Thanks for the comment,
  Sincerely,
  Rich
  
  Reply
Young Millionaire says:

October 30, 2010 at 5:25 am

Just got a chance to leave a comment so here it is! Excellent post and very interesting stuff! Hope all the best for your blog and your making money online ventures…

Just letting you know that I’ve signed up for your blog newsletter and looking forword to your future posts. It would be great if you’d do the same for my blog… I’ve also created a few products that I promote on my blog and would love if you’d consider promoting them on yours for some quick affiliate cash!

All the best,
Dino Vedo

PS: Like my Facebook fan page and follow me on Twitter and I’ll do the same for you! 😉

Reply
http://technologyou.blogspot.com/2010/10/powerpoint-templates.html says:

November 3, 2010 at 2:34 pm

I am really happy with ur blog and will book mark this

Reply
Maggie Musulin says:

November 13, 2010 at 9:06 am

Thank you… I’m going to add this to my favorites. If I may ask, what got you started into blogging? To be honest I’ve just been catching on to this hobbie and it’s really begun to inspire me to begin a blog of my own. I’ve tried but nothing that material has occured as of yet. You seem established, hints would be appreciated…

Reply
massage therapist says:

November 16, 2010 at 12:31 am

It’s really a nice and helpful piece of information. I’m glad that you shared this helpful info with us. Please keep us informed like this. Thanks for sharing.

Reply
William B says:

November 18, 2010 at 2:11 am

Ooohh, If you have a website in English language with 500 unique visitors per day, I can make you earn $200-1000 everyday and the request is after receiving the payment, we share the revenue 50 to 50. This is an invitation sent to you via a group-sending software, which helped me send more than 50,000 invitations to blog writers using wordpress, although only 5 of them established the cooperative relationship with us, they now get $2000-10000 every month. If you are interested in this invitation, please contact us. You will get an auto email reply with an url link liking to detailed information about this project. 😀

Reply
Jesus Nozum says:

December 3, 2010 at 12:34 am

We really love this site. Iwish I could come here everydayall day.

Reply
Hazard Internetowy says:

December 30, 2010 at 11:09 am

You got a definitely helpful blog I’ve been right here reading for about an hour. I’m a newbie and your accomplishment is quite a lot an inspiration for me.

Reply
Electronic Cigarette Starter Kit says:

January 1, 2011 at 4:24 pm

Great post bro. I like your writing style.

Reply