lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Toke Eskildsen>
Subject Re: is this the right way to go?
Date Tue, 15 Jun 2010 07:56:01 GMT
On Thu, 2010-06-10 at 04:03 +0200, fujian wrote:
> Another thing is about unique. I thought it was unique "field value". If it
> means unique term, for English even loading all around 300,000 terms it
> won't take much memory, right? (Suppose the average length of term is 10,
> the total memory usage is 10*300,000=3MB)

It is only the unique field values, but remember that there is also an
array of length #docs with pointers to the strings that takes up 4 or 8
bytes/pointer, depending on 32bit/64bit JVM. Furthermore, the current
Lucene uses Strings which takes up a lot more than just #chars bytes:
300.000 Strings of average length 10 chars is is about 18MB.

I'm quietly hacking on a solution for this, but the current code is
still at the proof of concept-stage and way too flaky to use for

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message