lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lance Norskog <>
Subject Re: Sort runs out of memory
Date Wed, 23 May 2012 22:27:09 GMT
The Trie type can be tuned for range queries v.s. single queries. This
seems to be explained in email and nowhere else:

On Mon, May 21, 2012 at 12:54 AM, Toke Eskildsen <> wrote:
> On Thu, 2012-05-17 at 23:03 +0200, Robert Bart wrote:
>> I am running Lucene 3.6 in a system that indexes about 4 billion documents
>> across several indexes, and I'm hoping to get documents in order of a
>> certain NumericField.
> What is the maximum size on any single index, in terms of number of
> documents? What is the type of the NumericField?
>> I've tried using Lucene's Sort implementation, but it looks like it tries
>> to do the entire sort in memory by allocating a huge array with space for
>> every document in the index.
> The FieldCache allocates an array of length #documents with the same
> type that your NumericField is. The sort itself is of the sliding window
> type, meaning that it only takes up memory lineary to the number of
> documents wanted in the response. Do you require millions of documents
> to be returned as part of a search?
> Sanity check: You do specify the type when performing a sorted search,
> right? If not, the values will be treated as Strings.
>>  On my index, this quickly runs out of memory.
> Assuming that your largest index is 1B documents and that your
> NumericField is of type Integer, the FieldCache's values for the sort
> should take up 1B * 4 = 4GB. Are you hoping for less?
>> Are there any alternatives or better ways of getting documents in order of
>> a NumericField for a very large index?
> Be sure to select the type of NumericField to be as small as possible.
> If you have few unique sort values (e.g. 17, 80, 2000 and 5678), you
> might map them down (to 0, 1, 2 and 3 for this example) and store them
> as a byte.
> Currently Lucene only supports atomic types for numerics in the
> FieldCache, so the smallest one is byte. It is possible to use only
> ceil(log2(#unique_values)) bits/document, although that requires a bit
> of custom coding.
> Regards,
> Toke Eskildsen
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

Lance Norskog

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message