lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Memory consumption in trunk for sorting and faceting
Date Sun, 12 Jun 2011 14:49:05 GMT
Yep, thanks. If I'm reading the JIRA right, the FST
stuff is on 3x, and I just tested that the same way
and got a memory footprint comparable to 1.4.1...

So this is pretty much all in the ByteRefs, right?

And my crude tests hit the worst case, unique strings..

Thanks
Erick

On Sun, Jun 12, 2011 at 10:06 AM, Michael McCandless
<lucene@mikemccandless.com> wrote:
> Right, not using objects is a huge win, especially on 64 bit JRE.
>
> Cutting over to UTF8 bytes is also a big drop in certain cases, since
> it's UTF8 vs UTF16 for 3.x.
>
> Ie, simple ascii fields take half the storage vs 3.x.
>
> Similarly, the terms index in 3.x uses multiple objects per indexed
> Term, and no objects in trunk (since it's just a single byte[] holding
> the FST), and also uses UTF8 to hold the term data, instead of UTF16.
>
> FST has been backported to 3.x but it's not used yet I think;
> back-porting the terms index improvements would be a biggish change...
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> On Sun, Jun 12, 2011 at 9:51 AM, Dawid Weiss
> <dawid.weiss@cs.put.poznan.pl> wrote:
>>> Is it fair to say that the two big innovations that have reduced the
>>> memory footprint are:
>>> 1> going to byte arrays for string storage
>>> 2> the FST work?
>>>
>>> Final question. It looks like the FST work is back-ported to the
>>> current 3_x code branch, is that true? Anything else back-ported
>>> there? I'll check that branch out and give it a whirl for kicks.
>>
>> I'm guessing it's going from Strings to ByteRefs (objects have
>> considerable overhead, really). This used to be my favorite showcase
>> for students -- manipulate a large array of Integer[] vs. manipulate
>> the same size array of int[]. A similar think applies to String
>> instances vs. ByteRefs (utf16 vs. utf8 encoding, object header
>> overhead, etc).
>>
>> Dawid
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message