lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Earwin Burrfoot <>
Subject String.intern() alternative for field names
Date Sun, 19 Apr 2009 10:50:00 GMT
Okay, we'd like to have equality-by-reference for field names,
yielding überfast comparisions in all our tight inner loops. But we
dislike default String.intern() for its java<->native transitions and
general lentitude.
There's a perfect solution. Too dumb to come up with it myself, but
fortunately spotted it in google-collections mailing list.

> Internally, we have a thing called an Interner.  It has an intern()
> method that works just like String.intern() except doesn't use permgen
> and works for any type.  It can use strong, weak or soft references.

Yay! With exception of weak/soft reference variety that's exactly what we need.
Unlucky for us, it is not yet public and we can't grab the code right away.

> It won't come out too soon because there are some changes happening to
> CustomConcurrentHashMap which it rests on; and because we're just putting
> all our effort into stabilizing right now, not features.

'Custom' there stands for supporting soft/weak keys/values, so we can
roll our own using stock ConcurrentHashMap. Ah, too sad, we're not
Java5 yet.
But! Nobody prevents us from taking good ol' HashMap, making a static
instance and using copy-on-write semantics with synchronization
happening only on addition to the map.
It will actually be faster when you're trying to intern strings that
are already in the pool, and slower when you try to intern something
new. Well, if you constantly try to intern new strings, you should
really worry of something different from performance, like hitting an

I did benchmarks for the proper use case, when you're interning
strings that are already in the pool.
Running benchmark with 10000000 rounds
SunT = 1545ms, MyT = 98ms
SunNoninternT = 1731ms, MyNoninternT = 792ms

First run is trying to intern() a string that is already interned,
e.g. a constant.
That is, most probably, what happens, when field names come from inside VM.

Second run is trying to intern() a string created via new String(constant).
That is, most probably, what happens, when field names come from
outside VM, like remote invocations. Or you're generating field names
on-the-fly. Sic!
In contrast to previous case, String.hashCode() is not cached and is
calculated for each invocation, plus String.equals() doesn't
short-circuit on reference equality. Also this run allocates a
bajillion of strings on heap and does array copies. I alleviated the
first problem with big enough heap to avoid GC.

Should I make a patch?

Kirill Zakharenko/Кирилл Захаренко (
Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423
ICQ: 104465785

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message