Wednesday, October 6, 2010

Advanced Search Engines Types



Search engines are an extremely powerful tool of promoting your business websites online and getting target customers. Many studies have shown that between 40% and 80% of users found what they were looking for by using the search engine feature of the Internet. According to a research it is concluded that 625 million searches are performed every day!


A web search engine is designed to search for information on the World Wide Web and FTP servers. The search results are generally presented in a list of results. The great thing about search engines is they bring targeted traffic to your website. These people are already motivated to make a purchase from you- because they searched you out.

With the right website optimization, the search engines can always deliver your site to your audiences.

4 types of Search engines mostly used are:

1. Crawler-Based Search Engines
2. Human Powered Directories
3. Hybrid Search Engines
4. Meta Search Engines

1- Crawler-based search engines

Crawler-based search engines use automated software programs to survey and categorize web pages. The programs used by the search engines to access your web pages are called ‘spiders’, ‘crawlers’, ‘robots’ or ‘bots’.
A spider will find a web page, download it and analyze the information presented on the web page. The web page will then be added to the search engine’s database. Then when a user performs a search, the search engine will check its database of web pages for the key words the user searched on to present a list of link results.

The results (list of suggested links to go to), are listed on pages by order of which is ‘closest’ (as defined by the ‘bots’), to what the user wants to find online. Crawler-based search engines are constantly searching the Internet for new web pages and updating their database of information with these new or altered pages.

Examples:
Examples of crawler-based search engines are:
* Google (www.google.com)
* Ask Jeeves (www.ask.com)


2- Human Powered Directories

A human-powered directory depends on humans for its listings. A directory gets its information from submissions, which include a short description to the directory for the entire site, or from editors who write one for sites they review. Human editors who decide what category the site belongs to; they place websites within specific categories in the ‘directories’ database. The human editors comprehensively check the website and rank it, based on the information they find, using a pre-defined set of rules.

Examples:
There are two major directories at the time of writing:
* Yahoo Directory
* Open Directory

3- Hybrid Search Engines

Hybrid search engines use a combination of both crawler-based results and directory results. It is extremely common for crawler-type and human-powered results to be combined when conducting a search. Usually, a hybrid search engine will favor one type of listings over another. For example, MSN Search is more likely to present human-powered listings from LookSmart.
More and more search engines these days are moving to a hybrid-based model. Example of hybrid search engine is MSN.

4- Meta Search Engines
Meta search engines take the results from all the other search engines results, and combine them into one large listing.

Examples:
Examples of Meta search engines include:
* Metacrawler
* Dogpile

NexGen Forum
provides a platform to learn, discuss, share, and find tutorials on Search engine discussion and marketing, including SEO, Paid marketing and Affiliate marketing. Latest updates of Search engine optimization techniques, Google Adwords, effective online marketing tactics and affiliate marketing all at NexGen forum.

Tuesday, October 5, 2010

Google Architecture




To engineer a search engine is a challenging task. Search engines index tens to hundreds of millions of web pages involving a comparable number of distinct terms. They answer tens of millions of queries every day. Despite the importance of large-scale search engines on the web, very little academic research has been done on them. It is not easy to understand the Google Search engine Architecture in a single article.

In this article I am giving a high level overview of Google Architecture, how the whole system works as pictured in Figure 1. Further sections will discuss the applications and data structures not mentioned in this section. Most of Google is implemented in C or C++ for efficiency and can run in either Solaris or Linux.

The details of main components of Google Architecture are given below:

Crawlers:
In Google, the web crawling (downloading of web pages) is done by several distributed crawlers. Crawlers are automated programs which fetch the website information over the web.

URL Server

There is a URL Server that sends lists of URLs to be fetched to the crawlers. The web pages that are fetched are then sent to the store server.

Store Server:
The store server then compresses and stores the web pages into a repository. Every web page has an associated ID number called a docID which is assigned whenever a new URL is parsed out of a web page. The indexing function is performed by the indexer and the sorter.


Indexer:
The indexer performs a number of functions. It reads the repository, uncompresses the documents, and parses them. Each document is converted into a set of word occurrences called hits.

The hits record the word, position in document, an approximation of font size, and capitalization. The indexer distributes these hits into a set of "barrels", creating a partially sorted forward index. The indexer performs another important function. It parses out all the links in every web page and stores important information about them in an anchors file. This file contains enough information to determine where each link points from and to, and the text of the link.

URL Resolver:

The URL Resolver reads the anchors file and converts relative URLs into absolute URLs and in turn into docIDs. It puts the anchor text into the forward index, associated with the docID that the anchor points to. It also generates a database of links which are pairs of docIDs. The links database is used to compute PageRanks for all the documents.

The sorter takes the barrels, which are sorted by docID (this is a simplification, see Section 4.2.5), and resorts them by wordID to generate the inverted index. This is done in place so that little temporary space is needed for this operation. The sorter also produces a list of wordIDs and offsets into the inverted index.


DumpLexicon

A program called DumpLexicon takes this list together with the lexicon produced by the indexer and generates a new lexicon to be used by the searcher. The searcher is run by a web server and uses the lexicon built by DumpLexicon together with the inverted index and the PageRanks to answer queries.

NexGen Forum provides a platform to learn, discuss, share, and find tutorials on Search engine marketing, including SEO, Paid marketing and Affiliate marketing. Latest updates of SEO - Search engine optimization techniques, Google Adwords, effective online marketing tactics and affiliate marketing all at NexGen forum.




Search Engine Saturation - Important Factor in Google Ranking




Search engine saturation is a term relating to the number of URLs included from a specific web site in any given search engine. It is basically a metric to measure how effective you are in search engine listings. The higher the saturation level or number of pages indexed into a search engine, the higher the potential traffic levels and rankings.

Saturation implies there is a bar or metric that allows you to determine how much of something has been touched, absorbed, transformed, etc. With respect to search engine indexing, there are different types of saturation.

For example, given a list of X search engines, you achieve 100% search engine saturation if your site is found in all X search engines (although that is perhaps the crudest of metrics as it would be 100% even if one search engine indexes a single page whereas another indexed 1000 pages). Using that same list of X search engines, you can also (or alternatively) say you achieve 100% search engine saturation if and only if all X search engines index every page on your site.

Let’s say you have a website with 100 pages. If 90 of those pages are indexed at Google, 75 are indexed at Yahoo!, 80 are indexed at MSN, and 65 at Ask.com, and you’d say that your search engine saturation is a sum of those. That is, 310. You could include smaller search engines like Mahalo, Dogpile, and the several thousand others out there, but then counting pages would never end, so I just stick with the big 4. It doesn’t matter that you have cross over on the indexing. If 65 of the pages indexed at Yahoo! are also indexed at Google, that doesn’t change your raw number. Your SES is a cumulative rating.

Crawl saturation, which measures how much of your site a search engine actually fetches.
Index saturation is about how much of your site is listed in the search engine's index. There is no correlation between crawl saturation and index saturation because a search engine has the option of listing documents it has found links to but which it has not yet fetched.

For example, you could use a working definition for full index saturation that stipulates a page has been crawled and fully indexed by the search engine or you could stipulate that a page must simply have been fetched or you could stipulate that a page must have a cache link in the search engine-listings.

Most if not all site searches in Google produce limited results. That is, they won't show you all the pages that Google has crawled/fetched and they won't show you which pages are in the Main Web Index and which pages are in the Supplemental Index.

NexGen Forum provides a platform to learn, discuss, share, and find tutorials on Search engine marketing, including SEO, Paid marketing and Affiliate marketing. Latest updates of SEO - Search engine optimization techniques, Google Adwords, effective online marketing tactics and affiliate marketing all at NexGen forum.

Monday, October 4, 2010

Choosing the Best CMS for your Website




Some of you might be aware of what a CMS is and some of you might not be. Anyways, a Content Management System is a software program, which is compatible with all websites and that helps the users manage the content of their websites easily & efficiently. Even a person with limited system knowledge can easily operate CMS as it performs most of the tasks for you.

Let us have a glance at some of the popular CMS available:

WordPress:

WordPress, an open source CMS, enables the users to organize, manage and publish the content of the website. Today, most organizations have realized the advantages of using WordPress as CMS and it has become quite popular. It supports only ne web blog per application and has a rich suite of useful widgets and attractive themes. It also entails pingback and trackback features.

WordPress enables users to structure the content of their websites easily so that it gets indexed by search engines fast. It also facilitates users to customize the URLs thereby helping them pick up the most relevant keywords. Its kit of plug-ins fetches you links to a wide array of social media websites and advantages for Denver SEO Denver SEO Company.

Joomla:

Joomla, an affordable open source CMS, allows multiple users to access, organize, manage and publish the content of the website. It is one of the most popular and extensively used CMS on the Web. This award winning CMS also enables users to build online applications and makes uploading of content easy, fast and effective.

It is quite easy to use and can be used for all kind of websites, from simple to complex. It has a rich repertoire of extensions, which run within Joomla environment and contribute their bit by adding functionality. Most of Joomla extensions are available free of cost. Using Joomla you can very easily add some good features to your website.

Drupal:

Drupal, a back-end CMS, is used to setup a forum, blog and all kinds of websites. It supports multiple user accounts, RSS feeds, customizable layouts etc. Written in PHP, Drupal runs on all kinds of computing platforms and is Denver SEO Denver SEO Company friendly. It has an extensible code base and a powerful theme management system.

Using Drupal, users can organize and publish the content of their websites quite easily and efficiently. Drupal core encompasses features that enable the users to register and maintain individual user accounts within a privilege system. No special programming skills are required to install a website as Drupal offers a decent programming interface.

There are some good web hosting providers such as LimeDomains, FatCow, HostGator etc. that offer reliable & affordable Wordpress hosting, Joomla hosting , Drupal hosting etc.
So what are you waiting for? Choose a CMS and get started today!

NexGen Forum is a platform for discussion of open source web design and development. Web developers can find latest updates on open source products including Joomla, Mambo, Drupal, CakePHP, discuss their problems and can free download HTML templates, Monster templates, Wordpress Templates, Joomla Templates, vBulletin Templates.