lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Hatcher <>
Subject Re: Are Non-consecutive Document IDs feasible?
Date Tue, 11 Oct 2005 16:11:45 GMT
How about indexing a field with your application-centric id?  This is  
_the_ way this sort of thing is handled.  You could then query for a  
specific id using a TermQuery.


On Oct 11, 2005, at 11:58 AM, Shane O'Sullivan wrote:

> Hi all,
> As far as I understand today, Lucene assigns docIDs to documents  
> according
> to the order in which the documents are added to the index. Hence,  
> docIDs
> are assigned by the engine in a sequential manner, without gaps.  
> This order
> of document identifiers then determines the order of the postings  
> in the
> postings lists, i.e. all postings lists are sorted by docID. It  
> also means
> that the same document appearing in two different indices would  
> probably not
> have the same docID (unless some extreme care was taken to insert  
> documents
> in the same order).
> There are situations where the application wants to determine the  
> docID for
> the index, i.e. to control the ordering of occurrences in the postings
> lists. This is useful to ensure, for example, that a document has a  
> stable
> and consistent document identifier regardless of insertion order to an
> index.
> In either case, the application would want to pass into the index the
> numeric identifier of the document. However, such identifiers may  
> not be
> sequential, i.e. it's possible that there would be a document with  
> docID M
> without there being any document whose docID is M-1.
> Q1. How difficult would it be to change Lucene to accept the docIDs  
> from the
> application, and not care about any possible gaps those ids may have?
> One possible problem is that since the Doc Ids could become very  
> large, and
> are non-sequential, creating a single array for them all would not be
> feasible.
> Q2. Does Lucene's search code depend on the fact that document IDs are
> sequential?
> Thanks
> Shane

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message