lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ravikumar Govindarajan <ravikumar.govindara...@gmail.com>
Subject Re: App supplied docID in lucene possible?
Date Mon, 05 Nov 2012 09:37:30 GMT
Thanks Mike,

Joins could be slower than docID based approach, no?

It would be great if lucene can incorporate an external docID after
weighing the pros & cons. Many like us will be willing to trade-off search
latency to some extent, in return for the low hanging fruits

---
Ravi

On Fri, Nov 2, 2012 at 9:06 PM, Michael McCandless <
lucene@mikemccandless.com> wrote:

> I suspect app-controlled docID will be a challenge, but I haven't
> thought it through much.
>
> One possible solution might be to use joins?  Either index time or
> query time....
>
> Ie, make a document that has the big text field that never change, and
> a separate document that has all the little fields that frequently
> change, joined by a common field.
>
> Then you can freely update the little fields without changing the big
> field.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> On Thu, Oct 25, 2012 at 6:10 AM, Ravikumar Govindarajan
> <ravikumar.govindarajan@gmail.com> wrote:
> > We have the need to re-index some fields in our application frequently.
> >
> > Our typical document consists of
> >
> > a) Many single-valued {long/int} re-indexable fields
> > b) Few large-valued {text/string} static fields
> >
> > We have to re-index an entire document if a single smallish field changes
> > and it is turning out to be a problem for us. I have gone through the
> > https://issues.apache.org/jira/browse/LUCENE-3837 proposal where it
> tries
> > to work-around this limitation using a secondary mapping of new-old
> docids.
> >
> > As I understand, lucene strictly maintains internal doc-id order so that
> > many queries that depend on it, will work correctly. Segment merges will
> > also maintain order as well as reclaim deleted doc-ids
> >
> > There should be many applications like us, which manage index shards
> > limiting a given shard based on doc-id limits or size. So reclaiming
> > deleted doc-ids is mostly a non-issue for us.
> >
> > That leaves us with changing doc-ids. How about leaving open the doc-ids
> > themselves to the applications, at-least as an option to the needy?
> Taking
> > such an approach might inter-leave doc-ids across segments, but within a
> > segment, the docIds are always in increasing order. There are
> possibilities
> > of ghost-deletes, duplicate docIds etc..., but all should be solvable, I
> > believe.
> >
> > Fronting these doc-ids during search from all segment readers and
> returning
> > the correct value from one of them should be easy. Will it incur a heavy
> > penalty during search? Another advantage gained, is the triviality of
> > cross-joining indexes when docIDs are fixed.
> >
> > There must be many other places where an app supplied docId might make
> > lucene behave funny. Need some help in identifying those areas at least
> for
> > understanding this problem correctly, if not solving it all together.
> >
> > --
> > Ravi
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message