lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Hatcher <>
Subject Re: Document numbers and ids
Date Fri, 04 Feb 2005 13:33:35 GMT

On Feb 4, 2005, at 12:24 PM, Simeon Koptelov wrote:
>> By "renumbered", it means it squeezes out holes left by deletes.  The
>> actual order does not change and thus does not affect a 
>> sort.
>> Documents are stored in the index in the order that they were indexed 
>> -
>> nothing changes this order.  Document id's are not permanent if 
>> deletes
>> occur followed by an optimize.
> Thanks for clarification, Erik. Could you answer one more question: 
> can I
> control the assignment of document numbers during indexing?

No, you cannot control Lucene's document id scheme - it is basically 
"for internal use".

> Maybe I should explain, why I'm asking.
> I'm searching for documents, but for most (almost all) of them I don't 
> really
> care about their content. I only want to know a particular numeric 
> field from
> document (id of document's category).
> I also need to know how many docs in category were found, so I can't 
> index
> categories instead of docs.
> The result set can be pertty big (30K) and all must be handled in 
> inner loop.
> So I wanna use HitCollector and assign intervals of ids to categories 
> of
> documents. Following this way, there's no need to actually retrieve 
> document
> in inner loop.
> Am I on the right way?

You should explore the use of IndexReader.  Index your documents with 
category id field, and use the methods on IndexReader to find all 
unique categories (TermEnum).


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message