lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yonik Seeley (JIRA)" <j...@apache.org>
Subject [jira] Issue Comment Edited: (LUCENE-1607) String.intern() faster alternative
Date Mon, 20 Apr 2009 21:28:47 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12700931#action_12700931
] 

Yonik Seeley edited comment on LUCENE-1607 at 4/20/09 2:27 PM:
---------------------------------------------------------------

bq. The fastest hash we can get, should have no collisions. This is achievable by resizing
on each new collision.

*edit*: agree, for the first version that was only a cache where collisions invalidate the
entry and cause another String.intern() to be called... my comments below are with respect
to the second version of my code where interned strings are never dropped from the table.

Hmmm, in my quick'n'dirty tests of about 256 unique strings, a smaller hash table was actually
quicker (initialized with 32 and let it resize vs starting at 1024).  I imagine that this
would be due to a larger part of the table fitting in smaller and faster processor caches.
 YMMV.  Collisions should also be very quick to skip by comparing the hash code (which is
cached for Strings).



      was (Author: yseeley@gmail.com):
    bq. The fastest hash we can get, should have no collisions. This is achievable by resizing
on each new collision.

Hmmm, in my quick'n'dirty tests of about 256 unique strings, a smaller hash table was actually
quicker (initialized with 32 and let it resize vs starting at 1024).  I imagine that this
would be due to a larger part of the table fitting in smaller and faster processor caches.
 YMMV.  Collisions should also be very quick to skip by comparing the hash code (which is
cached for Strings).


  
> String.intern() faster alternative
> ----------------------------------
>
>                 Key: LUCENE-1607
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1607
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Earwin Burrfoot
>             Fix For: 2.9
>
>         Attachments: intern.patch, LUCENE-1607.patch, LUCENE-1607.patch, LUCENE-1607.patch,
LUCENE-1607.patch
>
>
> By using our own interned string pool on top of default, String.intern() can be greatly
optimized.
> On my setup (java 6) this alternative runs ~15.8x faster for already interned strings,
and ~2.2x faster for 'new String(interned)'
> For java 5 and 4 speedup is lower, but still considerable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message