Think about how many times each day you use a search engine. It's probably a lot more than you realize. Have you ever thought about exactly how Google can come up with 154,000,000 websites related to "soda" in less than a quarter of a second?! Probably not.
But understanding how search engines work and identifying parts of a results page will not only make you a more savvy searcher, it can also help your business or website. By making your website as appealing to search engines as possible (search engine optimization), you can end up seeing a lot more traffic.
Search engines such as Google, Bing, and Yahoo work extremely hard to provide internet users with the most relevant and useful information. In order to do so they must first scan the vast world wide web to download information from billions of web pages. These pages are retrieved by a web crawler, also referred to as a spider. A spider starts with a list of URLs to visit, called the seeds. As the spider visits these URLs, it identifies all the hyperlinks in the page and adds them to the list of URLs to visit, called the crawl frontier or index.
So let's put this in perspective: You've decided to launch your own website for fitness and health tips. How will Google find out about your shiny new site?
One way is through the use of external links. Typically, a spider will come to you by finding another link that leads to your website. If your fitness and health website has no links, your best bet is to submit a sitemap - a list of pages on a website to help users and crawlers navigate through it. This will alert crawlers that something is new and they will make their way to you. If you decide you would rather not go through any of that trouble, your site will eventually be found, but it will take longer and the crawlers may not find all of your pages.
Here are a few tips to increase search engine crawling:
- Build an adequate link structure. In terms of navigation, place your most important pages close to the homepage. You want the most relevant content to be readily available. Spiders may not crawl deep enough into your website to see buried pages and content.
- Consistently create content. Adding new content on a regular basis is extremely important. One way to do so is to create a blog. The more you add quality content, the more spiders will return to your page.
- Start link building. Again, spiders like to extend their nets via other links, so if you have links coming in from other quality sites, you can increase your crawl rate.
- Clean up your URLs. SEO-friendly URLs are a must! In order to prevent crawler frustration and ensure your web page is explored extensively, do without the weird symbols or extraneous characters.
- Check for broken links. Make sure all of your links are working, especially internal links. If a crawler is going through your site and hits a dead end, consider Spidey squashed and unable to crawl any further.
The next step to search engine optimization is indexing. Indexing is exactly what it sounds like, the process in which search engines store content downloaded by spiders/crawlers. So let's pretend you really want to know how to make red velvet cookies. You go to Google and type 'how to make red velvet cookies' into the search bar. Instead of scanning a billion web pages for the recipe, Google already has it on file in its index.
This process occurs each time you search, the system only has to run through an already created database. Once a web page is discovered, it is continuously visited by web crawlers. Each time a crawler finds new information on your site, it will tweak the content stored in its index, allowing for the most up to date and relevant information at all times. Words such as it, is or for are ignored completely in the process of indexing. Important phrases and keywords are what allow quick retrieval of information from the index.
I hope today's post helped you to understand how much information is literally at the tip of your fingers once you press that search button. These creepy crawlers that continuously scan the vast world wide web are nothing short of amazing.