What Are Search Engine Crawlers?
[adrotate banner=”16″]
The definition of Web Crawler directly from Wiki
A Web crawler is a computer program that browses the World Wide Web in a methodical, automated manner. Other terms for Web crawlers are ants, automatic indexers, bots, and worms[1] or Web spider, Web robot.
This process is called Web crawling or spidering. Many sites, in particular search engines, use spidering as a means of providing up-to-date data. Web crawlers are mainly used to create a copy of all the visited pages for later processing by a search engine that will index the downloaded pages to provide fast searches. Crawlers can also be used for automating maintenance tasks on a Web site, such as checking links or validating HTML code. Also, crawlers can be used to gather specific types of information from Web pages, such as harvesting e-mail addresses (usually for spam).
A Web crawler is one type of bot, or software agent. In general, it starts with a list of URLs to visit, called the seeds. As the crawler visits these URLs, it identifies all the hyperlinks in the page and adds them to the list of URLs to visit, called the crawl frontier. URLs from the frontier are recursively visited according to a set of policies.
So the Web Crawler is the tool that visits your site late at night and traverses as much of your site as possible making that information available to search engines such as Google or Bing so they can easily find your content, and rank it properly. So the Web Crawler is your friend and you want to make it as easy for the crawler as possible when visiting your site. If you don’t help the crawler, it will use generic rules that may hurt your search ranking, or might make certain parts of your site un-searchable. It may waste time indexing system files such as CSS files or other files that are meaningless to the purpose of your site.
Make the Robots Happy and Productive – Use “Robots.Txt” Files
If you believe in Christmas and Santa, then you know what I mean when I say you wouldn’t go to bed without leaving some cookies and milk for Santa to put him in a good mood while visiting your house. A simple and standard file called “robots.txt” is what the Robots or Crawlers look for the moment they get to your site. Imagine their disappointment that you didn’t even think of them by leaving a simple robots.txt file.
How the Robots Work
Seriously speaking, the Crawler or Robot is looking for objects on your site, and objects are files, folders, web links. Upon reaching the root of your site, the Robot looks for the Robots.txt file and uses it to understand your site and maximize the benefit of the crawl. Your robots.txt file will help the robot to assign content to indexes, and understand arranged web page order to structure indexes for faster finding by internet searcher. In this case crawler will filter which are web page, file, folder and which can be indexed or not. Most of web page contain links to other pages and normally spider will start from top left to right down similar to reading a book.
Making your robots.txt File
Robots.txt is text file not html this will be placed on the root of your web site. There are books written on the subject of web crawlers and usage of robots.txt files but here is a simple start:
Location & naming
1. Name it robots.txt *not* robot.txt or Robot.Txt, or spider.txt
2. Add rules to the text file, save, and place a copy at the root of your web site. Many sites available on rule formats.
Example 1 – Disallow All robots for specific folders and files
Make a list of everything on your site you DON’T want robots/spiders to visit and put in file like this. Note: You could replace the wildcard for user agent and put specific robots that you want to ban.
# robots.txt for http://www.sample.com/ User-agent: * Disallow: /chat/ # Online chat files Disallow: /testsite/ # This is a test area Disallow: /login.html # This is a an admin file
Good Luck
I hope this helps give you a basic understanding of the robots.txt and how Web Crawlers work. This information here scratches the surface of what you can do with the robots.txt file. There are tons of sites focusing on it entirely so I won’t bother reinventing the wheel– just wanted to get you started.
Sincerely,
The DisplacedGuy (a.k.a. Rich Bianco)
P.S. My daughter Heather is taking ownership of another blog called Otown411 as she wants to help the family situation with me being unemployed. She sees me working day and night and was willing to try to get Otown411 up and running. The site is targeted towards people looking to visit or vacation in the Orlando area and we would offer insider tips about maximizing the vacation since we live here and know all the ins-and-outs. It would be great if you could stop over and give her some motivation to stick with it. She is like me and will be checking visitor stats constantly which is the motivating part. I’m banking on this time I’m spending as being an investment… fingers crossed. IF any of the sponsor sites on this site are appropriate please consider visiting them.
34 Responses
[…] This post was mentioned on Twitter by Rich Bianco. Rich Bianco said: RT @DisplacedGuy Search Engine Crawlers – Treat & Know them like a Friend http://bit.ly/cizOoc […]
Salut J’arrive sans être un passionné sur ce message et je dois dire qu’il nous ouvre les yeux. Un grand merci pour ce blog. Bon courage !
Thanks for the Blog, thanks for helping me with this fine Article. I think it is really a great topic to write about on my blog. Also here is some good information if needed: refurbished apple computers
Hey dear can i publish some paragraph of your post on my little blog of university.I have to publish a good articles out there and i do think your post Fits well into it.I will be glad to provide you an source link as well.I have two blogs one my personal and the other which is my college blog.I will write some part in the university blog.Hope you do not mind.Greetings
Excellent contenu. Merci de poster.
Very neat blog post.Really looking forward to read more. Fantastic.
Book marked your webblog. Appreciation for discussing. Absolutely really worth the time from our tests.
Thanks for the interesting content!!!
greetings there, i just saw your site via bing, and i would like to comment that you express exceptionally good via your site. i am very struck by the mode that you write, and the subject is quality. anyway, i would also love to acknowledge whether you would like to exchange links with my site? i will be certainly more than willing to reciprocate and enter your link off in the blogroll. anticipating for your answer, i would like to convey my appreciation and have a great day!
Another very strong and powerful post. I’ve been reading through some of your previous posts and finally decided to drop a comment on this one. I signed up for your newsletter, so please keep up the informative posts!
Hope you subscribe to my blog as well and leave a few comments here and there! Also would appreciate it if you check out some of my products I created and maybe you could promote them on your blog to make us both some money! Check out my site at : Make Money Online with Dino Vedo.
All the best,
Dino Vedo
PS: Like my Facebook fan page and follow me on Twitter and I’ll do the same for you! 😉
Wow!, this was a real quality post. In theory I’d like to write like this too – taking time and real effort to make a good article… but what can I say… I keep putting it off and never seem to get something done
nice article, i just bookmarked it for future reference. i’d like to check on future articles. how can i set the rss reader again? thanks!
I hope you have a good day! Very good article, well written and very thought out. I am looking forward to reading more of your posts in the future.
I found your entry interesting thus I’ve added a Trackback to it on my weblog :)…
This site is so great that i will honor it with my comment 🙂
Great, Thank you.
Thanks for the informative article. Will check out more of your blog posts!
Hello Je découvre grace à Google sur ce topic et je reconnais qu’il fait réfléchir. Franchement merci pour ce message. Bonne continuation !
Thanks man. It is cool reading.
I like this website and it has shown me some sort of inspiration to have success for some reason, so thanks. Moreover I´m definitely thinking about mentioning these facts in my own blog!
I just assumed i’d distribute and let you realize your weblogs is valuable for uncovered the practical strategy.I genuinely love your weblog.Systematically, the post is in actuality the best on this worth while topic. I concur together with your ideas and will desperately search forward for your forthcoming tweets. Simply just saying thanks will not just be enough, for that brilliant lucidity inside your methods. I will quickly capture your rss feed to remain updated of any updates.Real do the trick and substantially achievements in your give good results and small business tries.Anyways maintain up the very good efforts.Appreciate it.
Super post, tienen que marcarlo en Digg
Truden
Thanks for posting about this, I would love to read more….
Whats up ! Love your blog thanks for sharing it with us. Support local business.
Are you guys seeing short sales just dominate your market?
Yes, short sales and foreclosures are killing our housing market. Home prices have fallen over 100% in the last five years and have not showed any sign of leveling out. If I were a buyer I would NOT buy yet, I have a gut feeling that there will be a hard/fast crash in prices before the recovery begins, similar to how a stock when overbought needs to shake out all the weak hands.
Thanks for the comment,
Sincerely,
Rich
Just got a chance to leave a comment so here it is! Excellent post and very interesting stuff! Hope all the best for your blog and your making money online ventures…
Just letting you know that I’ve signed up for your blog newsletter and looking forword to your future posts. It would be great if you’d do the same for my blog… I’ve also created a few products that I promote on my blog and would love if you’d consider promoting them on yours for some quick affiliate cash!
All the best,
Dino Vedo
PS: Like my Facebook fan page and follow me on Twitter and I’ll do the same for you! 😉
I am really happy with ur blog and will book mark this
Thank you… I’m going to add this to my favorites. If I may ask, what got you started into blogging? To be honest I’ve just been catching on to this hobbie and it’s really begun to inspire me to begin a blog of my own. I’ve tried but nothing that material has occured as of yet. You seem established, hints would be appreciated…
It’s really a nice and helpful piece of information. I’m glad that you shared this helpful info with us. Please keep us informed like this. Thanks for sharing.
Ooohh, If you have a website in English language with 500 unique visitors per day, I can make you earn $200-1000 everyday and the request is after receiving the payment, we share the revenue 50 to 50. This is an invitation sent to you via a group-sending software, which helped me send more than 50,000 invitations to blog writers using wordpress, although only 5 of them established the cooperative relationship with us, they now get $2000-10000 every month. If you are interested in this invitation, please contact us. You will get an auto email reply with an url link liking to detailed information about this project. 😀
We really love this site. Iwish I could come here everydayall day.
You got a definitely helpful blog I’ve been right here reading for about an hour. I’m a newbie and your accomplishment is quite a lot an inspiration for me.
Great post bro. I like your writing style.