From Tatu Saloranta <>
Subject Field constructor, avoiding String.intern()
Date Tue, 14 Feb 2006 21:42:53 GMT
After profiling in-memory indexing, I noticed that
calls to String.intern() showed up surprisingly high;
especially the one from Field() constructor. This is
understandable due to overhead String.intern() has
(being native and synchronized method; overhead
incurred even if String is already interned), and the
fact this essentially gets called once per
document+field combination.

Now, it would be quite easy to improve things a bit
(in theory), such that most intern() calls could be
avoid, transparent to the calling app; for example,
for each IndexWriter() one could use a simple
HashMap() for caching interned Strings. This approach
is more than twice as fast as directly calling
intern(). One could also use per-thread cache, or
global one; all of which would probably be faster.
However, Field constructor hard-codes call to
intern(), so it would be necessary to add a new
constructor that indicates that field name is known to
be interned.
And there would also need to be a way to invoke the
new optional functionality.

Has anyone tried this approach to see if speedup is
worth the hassle (in my case it'd probably be
something like 2 - 3%, assuming profiler's 5% for
intern() is accurate)?

-+ Tatu +-

