lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: App supplied docID in lucene possible?
Date Mon, 05 Nov 2012 15:41:21 GMT
On Mon, Nov 5, 2012 at 4:37 AM, Ravikumar Govindarajan
<ravikumar.govindarajan@gmail.com> wrote:
> Thanks Mike,
>
> Joins could be slower than docID based approach, no?

Yes: slower at search time but faster at update time (generally not a
good tradeoff... but it seems like in your case slow updates are the
problem).

> It would be great if lucene can incorporate an external docID after
> weighing the pros & cons. Many like us will be willing to trade-off search
> latency to some extent, in return for the low hanging fruits

I think this would be very hard, for stored fields / term vectors /
doc values / field cache / deleted docs, which cannot store documents
"sparsely" today.

Postings can store sparsely, but, when we write the postings in
IndexWriter's RAM buffer, we rely on docIDs being assigned "in order".
 So if the app specified the docID, we'd have to change how we buffer
postings in RAM, and then fix flush to re-sort the docIDs before
writing the segment.

We have discussed such sort-docIDs-on-flush before, eg you can reduce
postings size if you sort similar documents "together", but I don't
know of anyone implementing that.

Also lots of places at search time rely on a docID being the sum of a
segment's docBase and the docID within the segment ... that would have
to change to just use the decoded docID directly.

Mike McCandless

http://blog.mikemccandless.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message