lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tatu Saloranta <>
Subject Re: TermFreqVector
Date Thu, 29 Apr 2004 04:14:54 GMT
On Wednesday 28 April 2004 10:20, Doug Cutting wrote:
> Christoph Goller wrote:
> > However outside the package only the public
> > constructor of Term can be called, which does the field.intern(). I
> > donĀ“t know how expensive the call to intern() really is. Maybe my
> > worries are irrelevant.
> String.intern() is just a hash table lookup in most cases, and Java hash
> tables are pretty fast.  The constructor and object allocation will
> easily dominate this.

Actually, there is unfortunately JNI call overhead always present, since 
intern() is a native method in String class (at least in Sun's 
implementation). I don't know how costly that is; maybe with newer JVMs it's 
less of an issue. It'd be nice if it did indeed use Java HashMaps (or even 
specialized symbol tables), and avoided JNI altogether. Hash tables are very 
fast (esp. with JDK 1.4+), although bit of memory hogs. Also, it'd be good if 
String object itself could just keep 'isInterned' flag, to return itself if 
it's a result of intern() (or is a String constant, which are always interned 
when loading classes in).
This is one reason why many XML parsers rather use specialized symbol tables 
over plain intern() (another reason is that this way access can be done from 
char[], without having construct intermediate String).

Nevertheless, overhead of calling intern() may well be negligible, since like 
you say, memory allocation (plus eventual GC'ing of object) is likely to be 
bigger performance concern.

-+ Tatu +-

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message