lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From robert engels <reng...@ix.netcom.com>
Subject Re: [jira] Field constructor, avoiding String.intern()
Date Fri, 23 Feb 2007 18:11:20 GMT
I don't think it is just the performance gain of equals() where intern 
() matters.

It also reduces memory consumption dramatically when working with  
large collections of documents in memory - although this could also  
be done with constants, there is nothing in Java to enforce it (thus  
the use of intern()).


On Feb 23, 2007, at 12:02 PM, James Kennedy wrote:

>
> In our case, we're trying to optimize document() retrieval and we  
> found that
> disabling the String interning in the Field constructor improved  
> performance
> dramatically. I agree that interning should be an option on the  
> constructor.
> For document retrieval, at least for a small of amount of fields, the
> performance gain of using equals() on interned strings is no match  
> for the
> performance loss of interning the field name of each field.
>
>
>
> Wolfgang Hoschek-2 wrote:
>>
>> I noticed that, too, but in my case the difference was often much
>> more extreme: it was one of the primary bottlenecks on indexing. This
>> is the primary reason why MemoryIndex.addField(...) navigates around
>> the problem by taking a parameter of type "String fieldName" instead
>> of type "Field":
>>
>> 	public void addField(String fieldName, TokenStream stream) {
>> 		/*
>> 		 * Note that this method signature avoids having a user call new
>> 		 * o.a.l.d.Field(...) which would be much too expensive due to the
>> 		 * String.intern() usage of that class.
>>                   */
>>
>> Wolfgang.
>>
>> On Feb 14, 2006, at 1:42 PM, Tatu Saloranta wrote:
>>
>>> After profiling in-memory indexing, I noticed that
>>> calls to String.intern() showed up surprisingly high;
>>> especially the one from Field() constructor. This is
>>> understandable due to overhead String.intern() has
>>> (being native and synchronized method; overhead
>>> incurred even if String is already interned), and the
>>> fact this essentially gets called once per
>>> document+field combination.
>>>
>>> Now, it would be quite easy to improve things a bit
>>> (in theory), such that most intern() calls could be
>>> avoid, transparent to the calling app; for example,
>>> for each IndexWriter() one could use a simple
>>> HashMap() for caching interned Strings. This approach
>>> is more than twice as fast as directly calling
>>> intern(). One could also use per-thread cache, or
>>> global one; all of which would probably be faster.
>>> However, Field constructor hard-codes call to
>>> intern(), so it would be necessary to add a new
>>> constructor that indicates that field name is known to
>>> be interned.
>>> And there would also need to be a way to invoke the
>>> new optional functionality.
>>>
>>> Has anyone tried this approach to see if speedup is
>>> worth the hassle (in my case it'd probably be
>>> something like 2 - 3%, assuming profiler's 5% for
>>> intern() is accurate)?
>>>
>>> -+ Tatu +-
>>>
>>>
>>> __________________________________________________
>>> Do You Yahoo!?
>>> Tired of spam?  Yahoo! Mail has the best spam protection around
>>> http://mail.yahoo.com
>>>
>>> -------------------------------------------------------------------- 
>>> -
>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>>
>>
>
> -- 
> View this message in context: http://www.nabble.com/Field- 
> constructor%2C-avoiding-String.intern%28%29-tf1123597.html#a9123600
> Sent from the Lucene - Java Developer mailing list archive at  
> Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message