lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Toke Eskildsen ...@statsbiblioteket.dk>
Subject RE: Sorting with little memory: A suggestion
Date Fri, 19 Mar 2010 21:42:23 GMT
From: Robert Muir [rcmuir@gmail.com]:

[Toke: Indexing collation keys only helps with the speed problem]

> I don't really understand this measurement, collation keys are
> byte[]... (although its true we don't yet encode them this way in
> flex, I think we should)

I sounds like I'm missing something here... A quick check of running 20000 random Strings
of 30 characters from a-zA-Z0-1 + 20 different national characters through Java's Collator
returned an average collatorKey-length of 175 bytes. On http://wiki.apache.org/solr/UnicodeCollation
it is stated that a standard sort is used, which - to my knowledge - loads the Strings into
memory. For my quick test, this means a tripling of memory usage for the sort field when indexing
collatorKeys?

Regards,
Toke Eskildsen
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message