lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doron Cohen" <cdor...@gmail.com>
Subject Re: Performance guarantees and index format
Date Fri, 08 Feb 2008 15:09:52 GMT
I was once involved in modified a search index
implementation (not Lucene) to write posting lists so that
they can be traversed (only) in reverse order. Docids
were preserved but you got higher IDs first. This was
a non-trivial code change.

Now the suggestion to (optionally) order merged
segments from new to old should be much simpler
to implement (I think) and would be an interesting add-on.

If in addition DocumentsWriter is modified to optionally
reverse the order of written docs, you get the docs
completely reversed.

Being optional, applications caring about docids
stability would not use this option.

On Fri, Feb 8, 2008 at 12:22 AM, Chris Hostetter <hossman_lucene@fucit.org>
wrote:

>
> : I think this would be too messy - currently we can be sure of the simple
> rule
> : that documents added to the index get incrementally higher docids, i.e.
> the
> : higher the docid the more recent is the document. I think it would be
> much
> : simpler to write a FilterIndexReader that simply reverses the order of
> docids.
>
> First off: you only have that garuntee while indexing ... if you
> frequently reorder docs using something like the IndexSorter then that
> rule no longer applies (and you must not care or you wouldn't have
> reordered everything)
>
> Second: using IndexSorter after an index is completley built is definitely
> a simpler, clearner, way of accomplishing something like this -- but it
> only seems adequate for situations in which "index building" is seperate
> and distinct from "index searching" ... I can't see how it would work very
> easily in situations where you are continuously performing incremental
> updates while searches are taking place.
>
> : The issue with Nutch's IndexSorter is that it allows you to reorder
> docids in
> : an arbitrary manner, using a user-supplied mapping between old and new
> docids,
> : which can be based on values retrieved from the current index or from
> any
> : other source. So I think this would be the main value of this class
> applicable
> : to various scenarios.
>
> No Argument what-so-ever.  IndexSorter seems like a sweet tool to have in
> the Lucene toolbox for letting people reordering the docs in an index by
> arbitrary criteria ... but for people with the specific case of
> *prefering* that recently added docs be in front of older docs, automatic
> segment reordering seems like it would also be a handy tool to have in the
> toolbox so that documents could "bubble up" gradually.  (maybe as a new
> MergePolicy? ... probably need some API changes to allow order to be
> specified)
>
> There would definitley be trade offs people would need to consdier before
> using it -- but those tradeoffs would probably also apply if they wanted
> to use IndexSorter.
>
>
>
> -Hoss
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message