lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Hatcher <e...@ehatchersolutions.com>
Subject Re: Document numbers and ids
Date Fri, 04 Feb 2005 13:33:35 GMT

On Feb 4, 2005, at 12:24 PM, Simeon Koptelov wrote:
>> By "renumbered", it means it squeezes out holes left by deletes.  The
>> actual order does not change and thus does not affect a 
>> Sort.INDEXORDER
>> sort.
>>
>> Documents are stored in the index in the order that they were indexed 
>> -
>> nothing changes this order.  Document id's are not permanent if 
>> deletes
>> occur followed by an optimize.
>
> Thanks for clarification, Erik. Could you answer one more question: 
> can I
> control the assignment of document numbers during indexing?

No, you cannot control Lucene's document id scheme - it is basically 
"for internal use".

> Maybe I should explain, why I'm asking.
> I'm searching for documents, but for most (almost all) of them I don't 
> really
> care about their content. I only want to know a particular numeric 
> field from
> document (id of document's category).
> I also need to know how many docs in category were found, so I can't 
> index
> categories instead of docs.
> The result set can be pertty big (30K) and all must be handled in 
> inner loop.
> So I wanna use HitCollector and assign intervals of ids to categories 
> of
> documents. Following this way, there's no need to actually retrieve 
> document
> in inner loop.
>
> Am I on the right way?

You should explore the use of IndexReader.  Index your documents with 
category id field, and use the methods on IndexReader to find all 
unique categories (TermEnum).

	Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message