lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Deepak Shakya <just...@gmail.com>
Subject Re: App supplied docID in lucene possible?
Date Fri, 02 Nov 2012 11:41:29 GMT
Why don't you try using your app supplied DOC ID as one of the fields of
the document. You can add, update and delete documents based on that.


On Fri, Nov 2, 2012 at 4:55 PM, Ravikumar Govindarajan <
ravikumar.govindarajan@gmail.com> wrote:

> I am aware of ExternalFileField, but docID solution looks more elegant and
> performant
>
> Our re-indexing rate daily is around 35-40% of index additions.
>
> When a small int-value/boolean value in a lucene document changes, I need
> to re-index an entire 5-10MB content again. This is the reason why I am
> looking for manipulating docId of lucene
>
> In our case, sorting can be fully eliminated if lucene facilitates app
> supplied docId. Early query termination should also be possible with such
> an approach
>
> I know that IndexReader, SegmentMerge and IndexWriter will get affected. I
> would like to know what other areas of lucene get affected because of such
> an approach
>
> ---
> Ravi
>
> On Thu, Oct 25, 2012 at 8:20 PM, Jack Krupansky <jack@basetechnology.com
> >wrote:
>
> > Have you looked at or decided against an approach like Solr's
> > ExternalFileField?
> >
> > See:
> > http://lucene.apache.org/solr/**4_0_0/solr-core/org/apache/**
> > solr/schema/ExternalFileField.**html<
> http://lucene.apache.org/solr/4_0_0/solr-core/org/apache/solr/schema/ExternalFileField.html
> >
> >
> > Is that at least the kind of issue you are trying to deal with?
> >
> > One final question: How much of a document's field values are stable vs.
> > frequently changing? What are the numbers here - total field count, count
> > of frequently changed fields, and percentage of documents being updated
> in
> > some period of time?
> >
> > And, I don't quite follow why you can't just use a unique key for a
> > document rather than the low-level Lucene document id.
> >
> > -- Jack Krupansky
> >
> > -----Original Message----- From: Ravikumar Govindarajan
> > Sent: Thursday, October 25, 2012 6:10 AM
> > To: java-user@lucene.apache.org
> > Subject: App supplied docID in lucene possible?
> >
> >
> > We have the need to re-index some fields in our application frequently.
> >
> > Our typical document consists of
> >
> > a) Many single-valued {long/int} re-indexable fields
> > b) Few large-valued {text/string} static fields
> >
> > We have to re-index an entire document if a single smallish field changes
> > and it is turning out to be a problem for us. I have gone through the
> > https://issues.apache.org/**jira/browse/LUCENE-3837<
> https://issues.apache.org/jira/browse/LUCENE-3837>proposal where it tries
> > to work-around this limitation using a secondary mapping of new-old
> docids.
> >
> > As I understand, lucene strictly maintains internal doc-id order so that
> > many queries that depend on it, will work correctly. Segment merges will
> > also maintain order as well as reclaim deleted doc-ids
> >
> > There should be many applications like us, which manage index shards
> > limiting a given shard based on doc-id limits or size. So reclaiming
> > deleted doc-ids is mostly a non-issue for us.
> >
> > That leaves us with changing doc-ids. How about leaving open the doc-ids
> > themselves to the applications, at-least as an option to the needy?
> Taking
> > such an approach might inter-leave doc-ids across segments, but within a
> > segment, the docIds are always in increasing order. There are
> possibilities
> > of ghost-deletes, duplicate docIds etc..., but all should be solvable, I
> > believe.
> >
> > Fronting these doc-ids during search from all segment readers and
> returning
> > the correct value from one of them should be easy. Will it incur a heavy
> > penalty during search? Another advantage gained, is the triviality of
> > cross-joining indexes when docIDs are fixed.
> >
> > There must be many other places where an app supplied docId might make
> > lucene behave funny. Need some help in identifying those areas at least
> for
> > understanding this problem correctly, if not solving it all together.
> >
> > --
> > Ravi
> >
> > ------------------------------**------------------------------**---------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.**apache.org<
> java-user-unsubscribe@lucene.apache.org>
> > For additional commands, e-mail: java-user-help@lucene.apache.**org<
> java-user-help@lucene.apache.org>
> >
> >
>



-- 
With Regards,
Deepak Shakya
http://www.google.com/profiles/justdpk

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message