lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Muir <rcm...@gmail.com>
Subject Re: Sorting with little memory: A suggestion
Date Fri, 19 Mar 2010 21:45:39 GMT
On Fri, Mar 19, 2010 at 5:42 PM, Toke Eskildsen <te@statsbiblioteket.dk> wrote:

> I sounds like I'm missing something here... A quick check of running 20000 random Strings
of 30 characters from a-zA-Z0-1 + 20 different national characters through Java's Collator
returned an average collatorKey-length of 175 bytes. On http://wiki.apache.org/solr/UnicodeCollation
it is stated that a standard sort is used, which - to my knowledge - loads the Strings into
memory. For my quick test, this means a tripling of memory usage for the sort field when indexing
collatorKeys?
>

Right, JDK collation sucks, use the ICU for collation keys too:
http://site.icu-project.org/charts/collation-icu4j-sun
at 1.59 bytes/char, thats less than UTF-16


-- 
Robert Muir
rcmuir@gmail.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message