lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wolfgang Hoschek <wolfgang.hosc...@mac.com>
Subject Re: Field constructor, avoiding String.intern()
Date Tue, 14 Feb 2006 22:50:48 GMT
I noticed that, too, but in my case the difference was often much  
more extreme: it was one of the primary bottlenecks on indexing. This  
is the primary reason why MemoryIndex.addField(...) navigates around  
the problem by taking a parameter of type "String fieldName" instead  
of type "Field":

	public void addField(String fieldName, TokenStream stream) {
		/*
		 * Note that this method signature avoids having a user call new
		 * o.a.l.d.Field(...) which would be much too expensive due to the
		 * String.intern() usage of that class.
                  */

Wolfgang.

On Feb 14, 2006, at 1:42 PM, Tatu Saloranta wrote:

> After profiling in-memory indexing, I noticed that
> calls to String.intern() showed up surprisingly high;
> especially the one from Field() constructor. This is
> understandable due to overhead String.intern() has
> (being native and synchronized method; overhead
> incurred even if String is already interned), and the
> fact this essentially gets called once per
> document+field combination.
>
> Now, it would be quite easy to improve things a bit
> (in theory), such that most intern() calls could be
> avoid, transparent to the calling app; for example,
> for each IndexWriter() one could use a simple
> HashMap() for caching interned Strings. This approach
> is more than twice as fast as directly calling
> intern(). One could also use per-thread cache, or
> global one; all of which would probably be faster.
> However, Field constructor hard-codes call to
> intern(), so it would be necessary to add a new
> constructor that indicates that field name is known to
> be interned.
> And there would also need to be a way to invoke the
> new optional functionality.
>
> Has anyone tried this approach to see if speedup is
> worth the hassle (in my case it'd probably be
> something like 2 - 3%, assuming profiler's 5% for
> intern() is accurate)?
>
> -+ Tatu +-
>
>
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message