lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson" <erickerick...@gmail.com>
Subject Re: how to get the programmatic control over index's document id
Date Thu, 14 Feb 2008 13:45:10 GMT
I'm a little confused about what's happening here

Are you inserting a pre-existing primary key from the DB into your
Lucene documents?

Or are you using the Lucene doc ID as your primary key in the database?
If this latter, you're going to have many, many problems since the Lucene
ID can, and will, change. Sometimes surprisingly. So in this case speeding
up your app is the least of your problems <G>....

Assuming you're doing the former, you don't have to call getDocument to get
the DB primary key from your Lucene document. Look at TermEnum/TermDocs
which can give you the Lucene doc ID *from the index* rather than from the
document.
Of course, you'll have to index the DB PK in your Lucene document, but that
doesn't take much space....

And TermDocs/TermEnum is MUCH faster than getDocument....

But what problem are you really trying to solve? It sounds like you're
searching the
Lucene index, then trying to use the DB primary keys that you get back from
the search to do something in the DB. If this isn't the case, perhaps you
could explain the problem you're trying to solve a little more.

Best
Erick


On Thu, Feb 14, 2008 at 6:59 AM, Gauri Shankar <gshankar.sahu@gmail.com>
wrote:

> Thanks a lot for both of you.
>
> yes, I am talking about internally assigned document id.
>
> Erick : I am already using the unique id into the index mapped to one of
> our
> DB's primary key to uniquely identify the docs from index.  Now to get the
> value of this unique field i need to call getDocumet(). But when the
> resultset is too large than this step is very slow as I am calling
> getDocument for each hit. Now I am using the FIELDCACHE and that has
> improved a lot. But I am thinking If I can find a way so that docId can be
> my unique ids and that will super optimize our search.
> Any suggestions??
>
> Thanks,
> Gauri Shankar
>
> On Sat, Feb 9, 2008 at 9:10 PM, Erick Erickson <erickerickson@gmail.com>
> wrote:
>
> > If you're referring to the internally-assigned document id, I don't
> think
> > there is a way. Assuming you're trying to assign one yourself or some
> > such.
> >
> > From all the discussions I've seen, I don't think there's even a faint
> > possibility that controlling this will be added to Lucene. Note that
> > existing IDs change as your index changes.
> >
> > Why do you care? What problem are you trying to solve? One common
> > suggestion is to create your own field (as Patrick suggests) that
> contains
> > your own unique ID. Using TermEnum/TermDocs will give you efficient
> > ways of going from your unique ID to a docID...
> >
> > Best
> > Erick
> >
> > On Feb 9, 2008 7:38 AM, Gauri Shankar <gshankar.sahu@gmail.com> wrote:
> >
> > > Hi,
> > >
> > > I would like to get the control over the docId field from my code. Can
> > > anyone suggest some way for doing the same?
> > >
> > >
> > > --
> > > Warm Regards,
> > > Gauri Shankar
> > >
> >
>
>
>
> --
> Warm Regards,
> Gauri Shankar
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message