Search Engine News


...the search industry queries new media

search engine lowdown home search engine resources rss news feedcontact search engine lowdown

.:: SEL partners ::.
Desktop search engine from Copernic
Targeted traffic with Epilot
Text Link Ads
.:: navigating SEL ::.

>> marketing how-to's!

>> search/media interviews!

>> search news analysis!

>> SEL on your mobile!

>> sponsor SEL!

.:: get fed ::.

>> Subscribe to RSS Feed
>> Add to Bloglines
>> Add to Newsgator
>> Add to My Yahoo!
.:: sel (an)archives ::.

 >> 07.2003
 >> 08.2003
 >> 09.2003
 >> 10.2003
 >> 11.2003
 >> 12.2003
 >> 01.2004
 >> 02.2004
 >> 03.2004
 >> 04.2004
 >> 05.2004
 >> 06.2004
 >> 07.2004
 >> 08.2004
 >> 09.2004
 >> 10.2004
 >> 11.2004
 >> 12.2004
 >> 01.2005
 >> 02.2005
 >> 03.2005
 >> 04.2005
 >> 05.2005
 >> 06.2005
 >> 07.2005
 >> 08.2005
 >> 09.2005
 >> 10.2005
 >> 11.2005
 >> 12.2005
 >> 01.2006
 >> 02.2006
 >> 03.2006
 >> 04.2006
 >> 05.2006
 >> 06.2006
 >> 07.2006
 >> 08.2006
 >> 09.2006
 >> 10.2006
 >> 11.2006
 >> 12.2006
 >> 01.2007
 >> 02.2007

Search marketing in the new media era.

October 07, 2004
 
Web 2.0 - Exclusive Demonstration of Clustering from Google
You may think that the new book service and SMS tools from Google were the latest news from the Mountain View search engine. You'd be wrong!

At today's Web 2.0 (Web2con.com) "From the Labs" session, Peter Norvig, Ph.D., Director of Search Quality for Google revealed that the search engine is using the largest database of clustering in the world.

Norvig said he wanted to share three different initiatives that Google Labs had been working on to improve search. He said that these development were focused on “understanding the meaning” of search. He admitted that Google needed to go beyond “just keywords and linking structures of the web”.

He revealed that Google had been working on three different tasks to better understand the web.

1. Statistical machine translation
2. Named entities
3. Word clusters

Statistical machine translation

This involves understanding syntaxes and semantics of different languages, including machine languages. Norvig said Google has access to more text and computer resources than anyone has ever had before. They have hired some of the leading researches to get to the next step of search. Norvig then gave examples of Google’s current capabilities to convert different languages – he gave Arabic to English and Chinese to English as examples. He demonstrated how technology had gotten better but still had some problems with translation tools.

Named entities

Google’s biggest challenge is understanding which words it needs to discard when analyzing the content of a page. They try to “discard the noisy data and break it into sentences”. Norvig revealed that Google doesn’t need to analyze all of a page’s content, “just deal with the patterns of sentences.” His example included a search that might include the phrase “such as”. Google is working to extract key identifiers and remove content that doesn’t match.

Norvig then displayed a chart that demonstrated clustered and related concepts that overlap. Google looks for overlap in these clusters and then extracts the key data from these overlaps to determine relevancy.

Word clusters

This lead Norvig to discuss the third problem that Google is currently working on in their labs. He said the problem with web search is that an entered keyword could be associated with different meanings, but the results displayed may not be the meaning you want. This is why Google is working on the largest bayesian database of clusters to determine the most likely meaning for any given search request.

Norvig gave a live example from their labs of how clustering can be applied to search. He started with the phrase “search engine” and demonstrated the different clusters that were related to that term. The data was in a very raw format but demonstrated how Google is able to determine clusters related to that search term and order them in a decreasing likelihood of being related.

At the suggestion of an audience member, Norvig entered the phrase “George Bush” and displayed the clusters that relate to that search. At the top of the list were variations on the president’s name but as you went down the list, you could see broader keywords that were clearly relevant to the term “George Bush”. The phrase “miserable failure” was not apparent, but Norvig did highlight that the term “idiot” clearly appeared in clusters matching the president. A similar example using “John Kerry” revealed connections to John Edwards and Howard Dean, but to the laughter of the audience, there was no mention of the term “idiot” in the relating clusters.

From the demonstration made by Norvig, it is clear that Google is working on ways of improving and innovating its search technology for the future. While start-up Vivisimo may have clustering technology on the market, with the launch of Clusty, there is not doubt that Google has the resources and R&D to ensure that clustering technology is the "PageRank" of the future.




Powered by Blogger
Weblog Commenting by HaloScan.com
© 2006 Search Engine Lowdown. All Rights Reserved.
All views and opinions expressed are those of the author only,
protected by the First Amendment and are not representative of any company listed. All trademarks, slogans, text or logo representation used or referred to in this website are the property of their respective owners.