lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From James Kennedy <jk-pub...@troove.net>
Subject Re: [jira] Field constructor, avoiding String.intern()
Date Fri, 23 Feb 2007 18:02:38 GMT

In our case, we're trying to optimize document() retrieval and we found that
disabling the String interning in the Field constructor improved performance
dramatically. I agree that interning should be an option on the constructor.
For document retrieval, at least for a small of amount of fields, the
performance gain of using equals() on interned strings is no match for the
performance loss of interning the field name of each field.



Wolfgang Hoschek-2 wrote:
> 
> I noticed that, too, but in my case the difference was often much  
> more extreme: it was one of the primary bottlenecks on indexing. This  
> is the primary reason why MemoryIndex.addField(...) navigates around  
> the problem by taking a parameter of type "String fieldName" instead  
> of type "Field":
> 
> 	public void addField(String fieldName, TokenStream stream) {
> 		/*
> 		 * Note that this method signature avoids having a user call new
> 		 * o.a.l.d.Field(...) which would be much too expensive due to the
> 		 * String.intern() usage of that class.
>                   */
> 
> Wolfgang.
> 
> On Feb 14, 2006, at 1:42 PM, Tatu Saloranta wrote:
> 
>> After profiling in-memory indexing, I noticed that
>> calls to String.intern() showed up surprisingly high;
>> especially the one from Field() constructor. This is
>> understandable due to overhead String.intern() has
>> (being native and synchronized method; overhead
>> incurred even if String is already interned), and the
>> fact this essentially gets called once per
>> document+field combination.
>>
>> Now, it would be quite easy to improve things a bit
>> (in theory), such that most intern() calls could be
>> avoid, transparent to the calling app; for example,
>> for each IndexWriter() one could use a simple
>> HashMap() for caching interned Strings. This approach
>> is more than twice as fast as directly calling
>> intern(). One could also use per-thread cache, or
>> global one; all of which would probably be faster.
>> However, Field constructor hard-codes call to
>> intern(), so it would be necessary to add a new
>> constructor that indicates that field name is known to
>> be interned.
>> And there would also need to be a way to invoke the
>> new optional functionality.
>>
>> Has anyone tried this approach to see if speedup is
>> worth the hassle (in my case it'd probably be
>> something like 2 - 3%, assuming profiler's 5% for
>> intern() is accurate)?
>>
>> -+ Tatu +-
>>
>>
>> __________________________________________________
>> Do You Yahoo!?
>> Tired of spam?  Yahoo! Mail has the best spam protection around
>> http://mail.yahoo.com
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/Field-constructor%2C-avoiding-String.intern%28%29-tf1123597.html#a9123600
Sent from the Lucene - Java Developer mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message