lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Earwin Burrfoot (JIRA)" <>
Subject [jira] Updated: (LUCENE-1607) String.intern() faster alternative
Date Mon, 20 Apr 2009 20:42:47 GMT


Earwin Burrfoot updated LUCENE-1607:

    Attachment: LUCENE-1607.patch

Okay, I thought more about that. Yonik is amazing.

The fastest hash we can get, should have no collisions. This is achievable by resizing on
each new collision. Then we should introduce an upper bound for this process, for it not to
blow up. Finally, we can use our upper bound for hash size from the start.

I benchmarked a bit, it works better than HashTable.
Somewhat better for already interned strings, much better for noninterned strings.
"s1==s2 || s1.compareTo(s2) == 0" combo amazingly works faster than s1.equals(s2).
Additional hashcode check makes sparse hash access a bit slower and doesn't really help with
crowded hash.
Having a crowded hash degrades performance a lot. 

I updated my patch with Yonik's algorithm. Kept everything in statics (faster), allowed to
change cache size through system property for adventurous types, default is 16k (works well
for 100 values)

> String.intern() faster alternative
> ----------------------------------
>                 Key: LUCENE-1607
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Earwin Burrfoot
>             Fix For: 2.9
>         Attachments: intern.patch, LUCENE-1607.patch, LUCENE-1607.patch, LUCENE-1607.patch,
> By using our own interned string pool on top of default, String.intern() can be greatly
> On my setup (java 6) this alternative runs ~15.8x faster for already interned strings,
and ~2.2x faster for 'new String(interned)'
> For java 5 and 4 speedup is lower, but still considerable.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message