Latent Semantic
Indexing Concepts
Contributed by John Martin Sunday, 23
September 2007
Latent Semantic Indexing (LSI) is a new concept that Google
has began to employ and pioneer. It was originally used in
Google's Adsense program, as a way of seeing which adverts
would be the most relevant on a particular site. Google
recently bought a company called Applied Semantics, in an
effort to use LSI concepts and ideas in its search rankings,
and many other search engines are beginning to follow
suite.
What LSI is, in basic and non-mathematical terms, is the
ability for the search engine to search for websites on the
internet the same way a human would. In other words, the search
engine looks for relevance and quality, rather than just
keywords or links going in and out of the site. Keywords and
links were the way the search engines used to do things, which
was known as PageRank, but they found that this had a number of
weaknesses. Firstly, webmasters or SEO 'experts' that cheated
would come on top, by simply loading a site full of irrelevant
keywords, writing shocking quality, or
using link farms extensively. Many sites would just produce
further links to other irrelevant sites, all to sell the site
itself and make money from traffic or Adsense. The old
PageRank system
therefore penalized perfectly good sites – sites with
good content, or that added content too quickly, or that were
new – as it relied on links, votes and keywords. Most internet
users have been the victim of many irrelevant sites from search
engine top rankings, and so the search engines have been trying
to do their best at getting these sites off the rankings to
create a cleaner and higher quality internet experience.
Looking at LSI in more detail, it's easy for us to begin to
see how to structure and build our web pages correctly. LSI's
algorithm works by scanning your website for keywords, and then
comparing relationships between these passages and keywords. It
does this by scanning other websites that have the same
keywords (or concentration of those keywords) and finding
relating words and phrases. LSI goes so far as to also check
grammar, terminologies, spelling and the like on sites already
indexed and your website. Basically, what it is doing is
checking the overall theme of your website, whether it matches
what the user is searching for, and how it ranks to other sites
in terms of keyword relevance. The most relevant site wins.
For example, if you search for “cellphone” on a search
engine using PageRank, it will display sites that have the
highest mention of “cellphone” or links pertaining. But under
LSI, a search for “cellphone” displays results of sites that
also have the word “mobile phone” or “cellular phone” or
anything else that is relevant. What this means is that keyword
stuffing into sites and articles will not win you a higher
ranking, but quality, relevant content will. Website developers
and writers who have been doing website optimization on good
ethical and sound quality principles now finally come on top,
while irrelevant and rubbish sites are thrown off the rankings
completely. The better the quality and relevance of the site,
the better the performance.
Source: http://www.LatentSemanticIndexing.com
|