lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Muir <rcm...@gmail.com>
Subject Re: Sorting with little memory: A suggestion
Date Fri, 19 Mar 2010 17:57:24 GMT
On Fri, Mar 19, 2010 at 1:46 PM, Toke Eskildsen <te@statsbiblioteket.dk> wrote:
> From: Robert Muir [rcmuir@gmail.com]:
>> Toke, only partially-on-topic here, is it possible to describe your
>> use-case a little more where its preferable to use this Locale-based
>> sort instead of indexing collation keys (e.g. you have to support so
>> many locales this would be too much indexing overhead?)
>
> My original use case was to avoid the memory overhead: Looking at our current index,
we have ~7.5M documents with ~7M unique titles. They take up about 362MB as UTF-8 bytes, which
translates to a neat 1GB of RAM as Java Strings. That's 1GB less heap for other stuff for
us, plus a sort is fairly slow. Indexing collation keys only helps with the speed problem.

I don't really understand this measurement, collation keys are
byte[]... (although its true we don't yet encode them this way in
flex, I think we should)

-- 
Robert Muir
rcmuir@gmail.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message