lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zhang, Lisheng" <Lisheng.Zh...@BroadVision.com>
Subject RE: How to handle more than Integer.MAX_VALUE documents?
Date Wed, 03 Nov 2010 15:56:52 GMT
Thanks very much, I got it.

-----Original Message-----
From: Simon Willnauer [mailto:simon.willnauer@googlemail.com]
Sent: Tuesday, November 02, 2010 11:28 PM
To: Lance Norskog
Cc: java-user@lucene.apache.org
Subject: Re: How to handle more than Integer.MAX_VALUE documents?


On Wed, Nov 3, 2010 at 3:00 AM, Lance Norskog <goksron@gmail.com> wrote:
> You would have to control your MergePolicy so it doesn't collapse
> everything back to one segment.
maxmergedocs is an int too though!

simon
>
> On Tue, Nov 2, 2010 at 12:03 PM, Simon Willnauer
> <simon.willnauer@googlemail.com> wrote:
>> On Tue, Nov 2, 2010 at 1:58 AM, Lance Norskog <goksron@gmail.com> wrote:
>>> 2billion is a hard limit. Usually people split indexes into multiple
>>> index long before this, and use the parallel multi reader (I think) to
>>> read from all of the sub-indexes.
>>>
>>> On Mon, Nov 1, 2010 at 2:16 PM, Zhang, Lisheng
>>> <Lisheng.Zhang@broadvision.com> wrote:
>>>>
>>>> Hi,
>>>>
>>>> Now lucene uses integer as document id, so it means we cannot have more
>>>> than 2^31-1 documents within one collection? Even if we use MultiSearcher
>>>> the document id is still integer so it seems this is still a problem?
>>
>> This is really the limit of a segment, I think you can write you own
>> collector and collect documents which higher (absolute) doc ids than
>> INT_MAX. Yet, I think if you reach the limit of INT_MAX documents you
>> should really rethink the way your search works and apply some
>> sharding techniques. I really haven't been up to that many docs in a
>> single index but I think it should work to have multiple segments with
>> INT_MAX documents in it since we search on segment level provided if
>> you collector supports it.
>>
>> simon
>>>>
>>>> We have been using lucene for some time and our document count is growing
>>>> rather rapidly, maybe this is a much-discussed issue already, but I did not
>>>> find the lead, any pointer would be really appreciated.
>>>>
>>>> Thanks very much for helps, Lisheng
>>>>
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> Lance Norskog
>>> goksron@gmail.com
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>
>
>
> --
> Lance Norskog
> goksron@gmail.com
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Mime
View raw message