lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: App supplied docID in lucene possible?
Date Fri, 02 Nov 2012 15:36:24 GMT
I suspect app-controlled docID will be a challenge, but I haven't
thought it through much.

One possible solution might be to use joins?  Either index time or
query time....

Ie, make a document that has the big text field that never change, and
a separate document that has all the little fields that frequently
change, joined by a common field.

Then you can freely update the little fields without changing the big field.

Mike McCandless

http://blog.mikemccandless.com

On Thu, Oct 25, 2012 at 6:10 AM, Ravikumar Govindarajan
<ravikumar.govindarajan@gmail.com> wrote:
> We have the need to re-index some fields in our application frequently.
>
> Our typical document consists of
>
> a) Many single-valued {long/int} re-indexable fields
> b) Few large-valued {text/string} static fields
>
> We have to re-index an entire document if a single smallish field changes
> and it is turning out to be a problem for us. I have gone through the
> https://issues.apache.org/jira/browse/LUCENE-3837 proposal where it tries
> to work-around this limitation using a secondary mapping of new-old docids.
>
> As I understand, lucene strictly maintains internal doc-id order so that
> many queries that depend on it, will work correctly. Segment merges will
> also maintain order as well as reclaim deleted doc-ids
>
> There should be many applications like us, which manage index shards
> limiting a given shard based on doc-id limits or size. So reclaiming
> deleted doc-ids is mostly a non-issue for us.
>
> That leaves us with changing doc-ids. How about leaving open the doc-ids
> themselves to the applications, at-least as an option to the needy? Taking
> such an approach might inter-leave doc-ids across segments, but within a
> segment, the docIds are always in increasing order. There are possibilities
> of ghost-deletes, duplicate docIds etc..., but all should be solvable, I
> believe.
>
> Fronting these doc-ids during search from all segment readers and returning
> the correct value from one of them should be easy. Will it incur a heavy
> penalty during search? Another advantage gained, is the triviality of
> cross-joining indexes when docIDs are fixed.
>
> There must be many other places where an app supplied docId might make
> lucene behave funny. Need some help in identifying those areas at least for
> understanding this problem correctly, if not solving it all together.
>
> --
> Ravi

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message