lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <>
Subject Re: App supplied docID in lucene possible?
Date Fri, 02 Nov 2012 15:36:24 GMT
I suspect app-controlled docID will be a challenge, but I haven't
thought it through much.

One possible solution might be to use joins?  Either index time or
query time....

Ie, make a document that has the big text field that never change, and
a separate document that has all the little fields that frequently
change, joined by a common field.

Then you can freely update the little fields without changing the big field.

Mike McCandless

On Thu, Oct 25, 2012 at 6:10 AM, Ravikumar Govindarajan
<> wrote:
> We have the need to re-index some fields in our application frequently.
> Our typical document consists of
> a) Many single-valued {long/int} re-indexable fields
> b) Few large-valued {text/string} static fields
> We have to re-index an entire document if a single smallish field changes
> and it is turning out to be a problem for us. I have gone through the
> proposal where it tries
> to work-around this limitation using a secondary mapping of new-old docids.
> As I understand, lucene strictly maintains internal doc-id order so that
> many queries that depend on it, will work correctly. Segment merges will
> also maintain order as well as reclaim deleted doc-ids
> There should be many applications like us, which manage index shards
> limiting a given shard based on doc-id limits or size. So reclaiming
> deleted doc-ids is mostly a non-issue for us.
> That leaves us with changing doc-ids. How about leaving open the doc-ids
> themselves to the applications, at-least as an option to the needy? Taking
> such an approach might inter-leave doc-ids across segments, but within a
> segment, the docIds are always in increasing order. There are possibilities
> of ghost-deletes, duplicate docIds etc..., but all should be solvable, I
> believe.
> Fronting these doc-ids during search from all segment readers and returning
> the correct value from one of them should be easy. Will it incur a heavy
> penalty during search? Another advantage gained, is the triviality of
> cross-joining indexes when docIDs are fixed.
> There must be many other places where an app supplied docId might make
> lucene behave funny. Need some help in identifying those areas at least for
> understanding this problem correctly, if not solving it all together.
> --
> Ravi

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message