lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Engels" <reng...@ix.netcom.com>
Subject RE: Are Non-consecutive Document IDs feasible?
Date Tue, 11 Oct 2005 16:08:35 GMT
Just add another field to document that is your "external" document
identifier, which is what the request is essentially asking for - another
layer of indirection between identifiers and physical locations in the
index.

-----Original Message-----
From: Shane O'Sullivan [mailto:shaneosullivan1@gmail.com]
Sent: Tuesday, October 11, 2005 10:59 AM
To: java-dev@lucene.apache.org
Subject: Are Non-consecutive Document IDs feasible?


Hi all,

As far as I understand today, Lucene assigns docIDs to documents according
to the order in which the documents are added to the index. Hence, docIDs
are assigned by the engine in a sequential manner, without gaps. This order
of document identifiers then determines the order of the postings in the
postings lists, i.e. all postings lists are sorted by docID. It also means
that the same document appearing in two different indices would probably not
have the same docID (unless some extreme care was taken to insert documents
in the same order).

There are situations where the application wants to determine the docID for
the index, i.e. to control the ordering of occurrences in the postings
lists. This is useful to ensure, for example, that a document has a stable
and consistent document identifier regardless of insertion order to an
index.

In either case, the application would want to pass into the index the
numeric identifier of the document. However, such identifiers may not be
sequential, i.e. it's possible that there would be a document with docID M
without there being any document whose docID is M-1.

Q1. How difficult would it be to change Lucene to accept the docIDs from the
application, and not care about any possible gaps those ids may have?
One possible problem is that since the Doc Ids could become very large, and
are non-sequential, creating a single array for them all would not be
feasible.

Q2. Does Lucene's search code depend on the fact that document IDs are
sequential?

Thanks

Shane


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message