lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter W. <>
Subject Re: Lucene Internals question
Date Mon, 22 Jan 2007 21:42:22 GMT

Lucene gives the best documents for a given query and PageRank uses  
analysis with similar results but requires a large set of metadata to  

Scoring in Lucene delivers pure search while PageRank attempts to  
establish source authority.
I''m not strong in math, those who are can find an explanation of the  
latter here:


Peter W.

On Jan 22, 2007, at 12:00 PM, Mark Miller wrote:

> Well first Lucene checks all of the other documents in the world  
> for any that that refer to the document that your adding to  
> Lucene...and then...oh wait...
> Similarity.html
>> Hmm..doesn't lucene scoring determine how relevant a document is  
>> to your
>> query? That is what PageRank and HITS do as well, I believe. Page and
>> document are the same, if you want to index a page you'll  
>> obviously try to
>> convert it into a document. PageRank does link analysis to  
>> determine how
>> relevant that page is as it relates to the query you entered, does  
>> lucene
>> have something similar? How does lucene determine between two  
>> documents
>> which one should score higher if they both contain a certain term?  
>> Google
>> uses PageRank to make that determination, how does lucene do it?
>> On 1/22/07, Nicolas Lalevée <> wrote:
>>> Le Lundi 22 Janvier 2007 19:33, EDMOND KEMOKAI a écrit:
>>> > Hi All
>>> > This is a question for those familiar with lucene document  
>>> scoring. How
>>> > does it compare with googles PageRank or HITS, or are they very
>>> different?
>>> > I have being looking at the PageRank algorithm but I'll need to
>>> brush-off
>>> > my math skills before delving into it:)
>>> In fact Lucene is just a search engine. Then you can use the  
>>> search engine
>>> to
>>> search in web pages, like Nutch is using Lucene. And Google is  
>>> more like
>>> Nutch : a web crawler plus a web-search engine. So when you are  
>>> taking
>>> about
>>> page raking, it has nothing to do with Lucene scoring. Lucene  
>>> scoring is
>>> how
>>> about the result entry match your query. Page raking is more  
>>> about how
>>> relevant is the web page. So for a document, the Lucene scoring  
>>> depends on
>>> the query, and the page raking is quite absolute.
>>> Nicolas
>>> -------------------------------------------------------------------- 
>>> -
>>> To unsubscribe, e-mail:
>>> For additional commands, e-mail:
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message