lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Simon Willnauer <>
Subject Re: How to handle more than Integer.MAX_VALUE documents?
Date Tue, 02 Nov 2010 19:03:17 GMT
On Tue, Nov 2, 2010 at 1:58 AM, Lance Norskog <> wrote:
> 2billion is a hard limit. Usually people split indexes into multiple
> index long before this, and use the parallel multi reader (I think) to
> read from all of the sub-indexes.
> On Mon, Nov 1, 2010 at 2:16 PM, Zhang, Lisheng
> <> wrote:
>> Hi,
>> Now lucene uses integer as document id, so it means we cannot have more
>> than 2^31-1 documents within one collection? Even if we use MultiSearcher
>> the document id is still integer so it seems this is still a problem?

This is really the limit of a segment, I think you can write you own
collector and collect documents which higher (absolute) doc ids than
INT_MAX. Yet, I think if you reach the limit of INT_MAX documents you
should really rethink the way your search works and apply some
sharding techniques. I really haven't been up to that many docs in a
single index but I think it should work to have multiple segments with
INT_MAX documents in it since we search on segment level provided if
you collector supports it.

>> We have been using lucene for some time and our document count is growing
>> rather rapidly, maybe this is a much-discussed issue already, but I did not
>> find the lead, any pointer would be really appreciated.
>> Thanks very much for helps, Lisheng
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail:
>> For additional commands, e-mail:
> --
> Lance Norskog
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message