lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson" <erickerick...@gmail.com>
Subject Re: reg : document number
Date Mon, 06 Nov 2006 01:03:53 GMT
Do NOT rely on the Lucene document number. It changes periodically. As I
understand it the general algorithm is that each doc gets an ID one greater
than the current max doc ID at INDEX time. However, when you delete
documents and optimize your index, the document IDs change. Simplistically,
say you have docs indexed with IDs 1, 2, 3, 4, 5, remove 2 and reoptimize.
You then have IDs 1, 2, 3, 4 where 2, 3, 4 were 3, 4, 5 respectively.

WARNING: I have no idea whether that's exactly how it works. The point is
that the doc IDs change. I wouldn't count on trying to match any algorithm
that Lucene uses....

But you don't need to anyway. Just assign your own document ID that *you*
can guarantee doesn't change (no relation to the Lucene ID) and store that
wherever you want, then search on that. I believe you'll find that you can
search fast enough on such an ID that you won't notice the time. At any
rate, that's how I'd start out and only get fancier if performance proves
unacceptable.

Best
Erick

On 11/5/06, mukkamalla rama kumar <mukkamalla_r@yahoo.co.in> wrote:
>
> Hi,
>
>      How is this document number assigned to documents. Can i give my own
> document number.
>
>      I would like to get the document number for a particular file that i
> added to an index.
>
>
> ---------------------------------
> Find out what India is talking about on  - Yahoo! Answers India
> Send FREE SMS to your friend's mobile from Yahoo! Messenger Version 8. Get
> it NOW
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message