How Google Crawl Your Website?

How google crawl your website?

Search engines are the tools or websites that avail user different multiple solutions for user defined query. An user defined query is anything e.g. Akhsadainfosystem.com keyword searched on Google.

Different search engines use different algorithms to produce the result on user defined query. Yahoo, Bing, Google is example of search engines. Most popular search engine is Google in the world.

After deploying your website on server you will able to generate your presence on internet. Now use any tool to generate sitemap.xml and load it to your server where your website is hosted. Now using your credentials create webmaster account. After creating account claim your your ownership and submit your sitemap to Google.

No one give assurity when your website will be indexed. Generally 3 to 5 days google index website. After indexing you can view how many urls have been indexed in google in webmaster or using 'site:domain' e.g. site:http://www.akshadainfosystem.com in Google search. You will get all links with small description and heading.

Crawling:

Crawling is the process by which Googlebot discovers new and updated pages to be added to the Google index.

Google uses a huge set of computers to fetch (or "crawl") billions of pages on the web. The program that does the fetching is called Googlebot (also known as a robot, bot, or spider). Googlebot uses an algorithmic process: computer programs determine which sites to crawl, how often, and how many pages to fetch from each site.

Google's crawl process begins with a list of web page URLs, generated from previous crawl processes, and augmented with Sitemap data provided by webmasters. As Googlebot visits each of these websites it detects links on each page and adds them to its list of pages to crawl. New sites, changes to existing sites, and dead links are noted and used to update the Google index.

Crawler:

A crawler is a service or agent that crawls websites. Generally speaking, a crawler automatically and recursively accesses known URLs of a host that exposes content which can be accessed with standard web-browsers. As new URLs are found (through various means, such as from links on existing, crawled pages or from Sitemap files), these are also crawled in the same way.

Spider trap:

A spider trap (or crawler trap) is a set of web pages that may intentionally or unintentionally be used to cause a web crawler or search bot to make an infinite number of requests or cause a poorly constructed crawler to crash. Web crawlers are also called web spiders, from which the name is derived. Spider traps may be created to "catch" spambots or other crawlers that waste a website's bandwidth. They may also be created unintentionally by calendars that use dynamic pages with links that continually point to the next day or year.

Common techniques used are:

·         Creation of indefinitely deep directory structures like

http://www.akshadainfosystem.com/service/web-developement

·         Dynamic pages that produce an unbounded number of documents for a web crawler to follow. Examples include calendars and algorithmically generated language poetry.

·         Documents filled with a large number of characters, crashing the lexical analyzer parsing the document.

Indexing:

Indexing is done by googlebots. Googlebots processes pages which they crawled. Googlebot processes each of the pages it crawls in order to compile a massive index of all the words it sees and their location on each page. In addition, they process information included in key content tags and attributes, such as Title tags and ALT attributes. Googlebot can process many, but not all, content types. For example, they cannot process the content of some rich media files or dynamic pages.

How result is displayed on user defined query?

When you input the query google's smart machines search the content which is indexed in their system and retruns the result that is most relavent to the user. Means if you have a service - website developement and design then if user input the query as website development and design company then google will search all content indexed in their system (simply in database/s) and return the result viz most relavant to user. This is same as searching something somewhere.

While showing the result, google has rich datasets so they show results on their defined standards(factors).Some factors are as follows:

1. PageRank:

PageRank is the measure of the importance of a page based on the incoming links from other pages. In simple terms, each link to a page on your site from another site adds to your site's PageRank.

There are two types of page rank:

a) High page rank

b) Low page rank

2. Popularity:

Popular things can get easily searched. The most search items with maximum hits on links will be at tht top of search result.