lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lance Norskog <goks...@gmail.com>
Subject Re: How to handle more than Integer.MAX_VALUE documents?
Date Wed, 03 Nov 2010 02:00:20 GMT
You would have to control your MergePolicy so it doesn't collapse
everything back to one segment.

On Tue, Nov 2, 2010 at 12:03 PM, Simon Willnauer
<simon.willnauer@googlemail.com> wrote:
> On Tue, Nov 2, 2010 at 1:58 AM, Lance Norskog <goksron@gmail.com> wrote:
>> 2billion is a hard limit. Usually people split indexes into multiple
>> index long before this, and use the parallel multi reader (I think) to
>> read from all of the sub-indexes.
>>
>> On Mon, Nov 1, 2010 at 2:16 PM, Zhang, Lisheng
>> <Lisheng.Zhang@broadvision.com> wrote:
>>>
>>> Hi,
>>>
>>> Now lucene uses integer as document id, so it means we cannot have more
>>> than 2^31-1 documents within one collection? Even if we use MultiSearcher
>>> the document id is still integer so it seems this is still a problem?
>
> This is really the limit of a segment, I think you can write you own
> collector and collect documents which higher (absolute) doc ids than
> INT_MAX. Yet, I think if you reach the limit of INT_MAX documents you
> should really rethink the way your search works and apply some
> sharding techniques. I really haven't been up to that many docs in a
> single index but I think it should work to have multiple segments with
> INT_MAX documents in it since we search on segment level provided if
> you collector supports it.
>
> simon
>>>
>>> We have been using lucene for some time and our document count is growing
>>> rather rapidly, maybe this is a much-discussed issue already, but I did not
>>> find the lead, any pointer would be really appreciated.
>>>
>>> Thanks very much for helps, Lisheng
>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>
>>
>>
>> --
>> Lance Norskog
>> goksron@gmail.com
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>



-- 
Lance Norskog
goksron@gmail.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message