lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <>
Subject Re: Performance guarantees and index format
Date Thu, 07 Feb 2008 22:22:05 GMT

: I think this would be too messy - currently we can be sure of the simple rule
: that documents added to the index get incrementally higher docids, i.e. the
: higher the docid the more recent is the document. I think it would be much
: simpler to write a FilterIndexReader that simply reverses the order of docids.

First off: you only have that garuntee while indexing ... if you 
frequently reorder docs using something like the IndexSorter then that 
rule no longer applies (and you must not care or you wouldn't have 
reordered everything)

Second: using IndexSorter after an index is completley built is definitely 
a simpler, clearner, way of accomplishing something like this -- but it 
only seems adequate for situations in which "index building" is seperate 
and distinct from "index searching" ... I can't see how it would work very 
easily in situations where you are continuously performing incremental 
updates while searches are taking place.

: The issue with Nutch's IndexSorter is that it allows you to reorder docids in
: an arbitrary manner, using a user-supplied mapping between old and new docids,
: which can be based on values retrieved from the current index or from any
: other source. So I think this would be the main value of this class applicable
: to various scenarios.

No Argument what-so-ever.  IndexSorter seems like a sweet tool to have in 
the Lucene toolbox for letting people reordering the docs in an index by 
arbitrary criteria ... but for people with the specific case of 
*prefering* that recently added docs be in front of older docs, automatic 
segment reordering seems like it would also be a handy tool to have in the 
toolbox so that documents could "bubble up" gradually.  (maybe as a new 
MergePolicy? ... probably need some API changes to allow order to be 

There would definitley be trade offs people would need to consdier before 
using it -- but those tradeoffs would probably also apply if they wanted 
to use IndexSorter.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message