lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Koji Sekiguchi <k...@r.email.ne.jp>
Subject Re: FieldCache memory estimation - term values are interned?
Date Sun, 02 May 2010 00:23:21 GMT
Yonik Seeley wrote:
> 2010/4/30 Koji Sekiguchi <koji@r.email.ne.jp>:
>   
>> Are Strings that are got via FieldCache.DEFAULT.getStrings( reader,
>> field ) interned?
>>
>> Since I have a requirement for having FieldCaches of some
>> fields in 250M docs index, I'd like to estimate memory
>> consumed by FieldCache.
>>
>> By looking at FieldCacheImpl source code, it seems that
>> field names are interned, but values are not?
>>     
>
> Values are not interned, but in a single field cache entry (String[])
> the same String object is used for all docs with that same value.
>
>   
Yeah, you are right. Because I could see the arbitrary two Strings that have
same value were identical (== is true) in String[] gotten by 
getStrings() but
I coudn't see explicit intern() for them, I asked. Can you elaborate who 
done it?

> But... I think StringIndex is more commonly used in both Lucene and
> Solr than String[] (sorting, faceting, etc) so double check that it's
> not StringIndex you should be looking at.
>
>   
Yes, I think StringIndex is better than String[] for my requirement in terms
of memory consumption as well (an extreme case, male/female field etc.).

Thank you,

Koji

-- 
http://www.rondhuit.com/en/


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message