lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yang <>
Subject lucene algorithm ?
Date Wed, 25 Apr 2012 21:13:06 GMT
I read the paper by Doug "Space optimizations for total ranking",

since it was written a long time ago, I wonder what algorithms lucene uses
(regarding postings list traversal and score calculation, ranking)

particularly the total ranking algorithm described there needs to traverse
down the entire postings list for all the query terms,
so in case of very common query terms like "yellow dog", either of the 2
terms may have a very very long postings list in case of web search,
are they all really traversed in current lucene/Solr ? or  any heuristics
to truncate the list are actually employed?

in the case of returning top-k results, I can understand that partitioning
the postings list into multiple machines, and then combining the  top-k
from each would work,
but if we are required to return "the 100th result page", i.e. results
ranked from 990--1000th, then each partition would still have to find out
the top 1000, so
partitioning would not help much.

overall, is there any up-to-date detailed docs on the internal algorithms
of lucene?

Thanks a lot

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message