lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless" <luc...@mikemccandless.com>
Subject Re: [jira] Updated: (LUCENE-843) improve how IndexWriter uses RAM to buffer added documents
Date Fri, 23 Mar 2007 08:53:15 GMT

"Chris Hostetter" <hossman_lucene@fucit.org> wrote:
> : > Actually is #2 a hard requirement?
> :
> : A lot of Lucene users depend on having document number correspond to
> : age, I think.  ISTR Hatcher at least recommending techniques that
> : require it.
> 
> "Corrispond to age" may be missleading as it implies that the actual
> docid has meaning ... it's more that the relative order of addition is
> preserved regardless of deletions/merging
> 
> A trivial example of using this is getting the N newest documents
> matching
> a search using a HitCollector, it's just a bounded queue that only
> remembers the last N things you put in it.
> 
> An more complicated example is duplicate unique field detection:
> iterating
> over a TermDoc and knowing that the doc with the higheest docId is the
> last one added, so the earlier ones can be ignored/deleted.  (as i
> recall,
> Solr takes advantage of this.)

Got it, so we need to preserve this invariant.  So this puts the
general restriction on the Lucene merge policy that only adjacent
segments (ie, when ordered by segment number) can be merged.

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message