lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: is this the right way to go?
Date Tue, 15 Jun 2010 09:20:04 GMT
Also remember that a String object in Java has a high overhead -- the
object header (maybe 8 bytes), 3 int fields, a pointer (4 or 8 bytes)
to the (separate) char[], and then another object header for that
char[], and 2 bytes per character in the string.

Lucene trunk (4.0-dev) has cutover to shared byte[] holding the term
data as UTF8 encoded bytes, and mapping the doc -> ordinal using
packed ints... this is a sizable memory reduction in most cases.

Details are in https://issues.apache.org/jira/browse/LUCENE-2380

Mike

On Tue, Jun 15, 2010 at 3:56 AM, Toke Eskildsen <te@statsbiblioteket.dk> wrote:
> On Thu, 2010-06-10 at 04:03 +0200, fujian wrote:
>> Another thing is about unique. I thought it was unique "field value". If it
>> means unique term, for English even loading all around 300,000 terms it
>> won't take much memory, right? (Suppose the average length of term is 10,
>> the total memory usage is 10*300,000=3MB)
>
> It is only the unique field values, but remember that there is also an
> array of length #docs with pointers to the strings that takes up 4 or 8
> bytes/pointer, depending on 32bit/64bit JVM. Furthermore, the current
> Lucene uses Strings which takes up a lot more than just #chars bytes:
> 300.000 Strings of average length 10 chars is is about 18MB.
> http://www.javamex.com/tutorials/memory/string_memory_usage.shtml
>
>
> I'm quietly hacking on a solution for this, but the current code is
> still at the proof of concept-stage and way too flaky to use for
> production: https://issues.apache.org/jira/browse/LUCENE-2369
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message