On Wed, Feb 4, 2009 at 10:41 AM, Mark Miller <markrmiller@gmail.com> wrote:
> Todd Benge wrote:
>
>>
>> The intent is to reduce the amount of memory that is held in cache. As it
>> is now, it looks like there is an array of comparators for each index
>> reader. Most of the data in the array appears to be the same for each
>> cache
>> so there is duplication for each type ( string, float).
>>
>>
> Use an array cachekey and override it as not mergeable.
>
> I suppose in terms of the unuiqes terms array, you could see some
> duplication.
>
> I don't think there should be much duplication though - in the non String
> cases, each SegmentIndexReader will only hold the values for itself. The
> size of the sub arrays would be the same as the full array.
> In the String case, you will have duplicates for the unique terms array, so
> if you have a lot, that may cause issues, but the ordinal array will not be
> any larger. And the unuiqe terms array shouldnt be terrible - the number of
> terms per segment should drop logarithmically. I'm not sure you'll see much
> of a difference, and it would only be with String sorts.
>
> That is, unless you are creating your own separate FieldCaches on
> multisegmentreaders - then you would double everything.
>
>
>> Yes - we're runnning about 80G in the indices so there's not enough RAM
>> for
>> all the data in the fieldcache.
>>
>>
> That is a large index. Can you share how many documents?
I don't have the exact number but I think it's 200 - 250 million documents.
I'll see if I can get some more realistic numbers and re-post.
Thanks for the help.
Todd
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
|