lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Todd Benge <>
Subject Re: FieldCache Question
Date Wed, 04 Feb 2009 18:01:17 GMT
On Wed, Feb 4, 2009 at 10:41 AM, Mark Miller <> wrote:

> Todd Benge wrote:
>> The intent is to reduce the amount of memory that is held in cache.  As it
>> is now, it looks like there is an array of comparators for each index
>> reader.  Most of the data in the array appears to be the same for each
>> cache
>> so there is duplication for each type ( string, float).
> Use an array cachekey and override it as not mergeable.
> I suppose in terms of the unuiqes terms array, you could see some
> duplication.
> I don't think there should be much duplication though - in the non String
> cases, each SegmentIndexReader will only hold the values for itself. The
> size of the sub arrays would be the same as the full array.
> In the String case, you will have duplicates for the unique terms array, so
> if you have a lot, that may cause issues, but the ordinal array will not be
> any larger. And the unuiqe terms array shouldnt be terrible - the number of
> terms per segment should drop logarithmically. I'm not sure you'll see much
> of a difference, and it would only be with String sorts.
> That is, unless you are creating your own separate FieldCaches on
> multisegmentreaders - then you would double everything.
>>  Yes - we're runnning about 80G in the indices so there's not enough RAM
>> for
>> all the data in the fieldcache.
> That is a large index. Can you share how many documents?

I don't have the exact number but I think it's 200 - 250 million documents.
I'll see if I can get some more realistic numbers and re-post.

Thanks for the help.


> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message