lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From James Kennedy <jk-pub...@troove.net>
Subject Re: [jira] Field constructor, avoiding String.intern()
Date Fri, 23 Feb 2007 18:28:19 GMT

True. However, in the case where you are processing Documents one at a time 
and discarding them (e.g. We use hitCollector to process all documents from
a search), or memory is not an issue, it would be nice to have the ability
to disable the interning for performance sake.




Robert Engels wrote:
> 
> I don't think it is just the performance gain of equals() where intern 
> () matters.
> 
> It also reduces memory consumption dramatically when working with  
> large collections of documents in memory - although this could also  
> be done with constants, there is nothing in Java to enforce it (thus  
> the use of intern()).
> 
> 
> On Feb 23, 2007, at 12:02 PM, James Kennedy wrote:
> 
>>
>> In our case, we're trying to optimize document() retrieval and we  
>> found that
>> disabling the String interning in the Field constructor improved  
>> performance
>> dramatically. I agree that interning should be an option on the  
>> constructor.
>> For document retrieval, at least for a small of amount of fields, the
>> performance gain of using equals() on interned strings is no match  
>> for the
>> performance loss of interning the field name of each field.
>>
>>
>>
>> Wolfgang Hoschek-2 wrote:
>>>
>>> I noticed that, too, but in my case the difference was often much
>>> more extreme: it was one of the primary bottlenecks on indexing. This
>>> is the primary reason why MemoryIndex.addField(...) navigates around
>>> the problem by taking a parameter of type "String fieldName" instead
>>> of type "Field":
>>>
>>> 	public void addField(String fieldName, TokenStream stream) {
>>> 		/*
>>> 		 * Note that this method signature avoids having a user call new
>>> 		 * o.a.l.d.Field(...) which would be much too expensive due to the
>>> 		 * String.intern() usage of that class.
>>>                   */
>>>
>>> Wolfgang.
>>>
>>> On Feb 14, 2006, at 1:42 PM, Tatu Saloranta wrote:
>>>
>>>> After profiling in-memory indexing, I noticed that
>>>> calls to String.intern() showed up surprisingly high;
>>>> especially the one from Field() constructor. This is
>>>> understandable due to overhead String.intern() has
>>>> (being native and synchronized method; overhead
>>>> incurred even if String is already interned), and the
>>>> fact this essentially gets called once per
>>>> document+field combination.
>>>>
>>>> Now, it would be quite easy to improve things a bit
>>>> (in theory), such that most intern() calls could be
>>>> avoid, transparent to the calling app; for example,
>>>> for each IndexWriter() one could use a simple
>>>> HashMap() for caching interned Strings. This approach
>>>> is more than twice as fast as directly calling
>>>> intern(). One could also use per-thread cache, or
>>>> global one; all of which would probably be faster.
>>>> However, Field constructor hard-codes call to
>>>> intern(), so it would be necessary to add a new
>>>> constructor that indicates that field name is known to
>>>> be interned.
>>>> And there would also need to be a way to invoke the
>>>> new optional functionality.
>>>>
>>>> Has anyone tried this approach to see if speedup is
>>>> worth the hassle (in my case it'd probably be
>>>> something like 2 - 3%, assuming profiler's 5% for
>>>> intern() is accurate)?
>>>>
>>>> -+ Tatu +-
>>>>
>>>>
>>>> __________________________________________________
>>>> Do You Yahoo!?
>>>> Tired of spam?  Yahoo! Mail has the best spam protection around
>>>> http://mail.yahoo.com
>>>>
>>>> -------------------------------------------------------------------- 
>>>> -
>>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>
>>>
>>>
>>
>> -- 
>> View this message in context: http://www.nabble.com/Field- 
>> constructor%2C-avoiding-String.intern%28%29-tf1123597.html#a9123600
>> Sent from the Lucene - Java Developer mailing list archive at  
>> Nabble.com.
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/Field-constructor%2C-avoiding-String.intern%28%29-tf1123597.html#a9124055
Sent from the Lucene - Java Developer mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message