lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Muir <>
Subject Re: Sorting with little memory: A suggestion
Date Fri, 19 Mar 2010 21:45:39 GMT
On Fri, Mar 19, 2010 at 5:42 PM, Toke Eskildsen <> wrote:

> I sounds like I'm missing something here... A quick check of running 20000 random Strings
of 30 characters from a-zA-Z0-1 + 20 different national characters through Java's Collator
returned an average collatorKey-length of 175 bytes. On
it is stated that a standard sort is used, which - to my knowledge - loads the Strings into
memory. For my quick test, this means a tripling of memory usage for the sort field when indexing

Right, JDK collation sucks, use the ICU for collation keys too:
at 1.59 bytes/char, thats less than UTF-16

Robert Muir

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message