lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: heap memory issues when sorting by a string field
Date Thu, 17 Dec 2009 10:34:13 GMT
I think this'd make a nice contribution -- eg it could be bundled up
as a FieldComparator impl, eg LowMemoryStringComparator, that would
compute the global ords in multiple passes with limited RAM usage.

It'd give users the space/time tradeoff...

Mike

On Mon, Dec 14, 2009 at 9:09 AM, Toke Eskildsen <te@statsbiblioteket.dk> wrote:
> On Fri, 2009-12-11 at 14:53 +0100, Michael McCandless wrote:
>> How long does Lucene take to build the ords for the toplevel reader?
>>
>> You should be able to just time FieldCache.getStringIndex(topLevelReader).
>>
>> I think your 8.5 seconds for first Lucene search was with the
>> StringIndex computed per segment?
>
> Cold disk-cache (directly after reboot):
> [2009-12-14 14:44:10,914] Requesting StringIndex for field sort_title
> [2009-12-14 14:44:20,326] Got StringIndex of length 2916008 in 9
> seconds, 412 ms
>
> Warm disk-cache (3 minutes after first test):
> [2009-12-14 14:44:10,914] Requesting StringIndex for field sort_title
> [2009-12-14 14:44:20,326] Got StringIndex of length 2916008 in 8
> seconds, 414 ms
>
> The response time for the first sorted search was about 8,5 seconds, but
> that was after 6 non-sorted searches without the use of explicit field
> cache, so some amount of warm-up was performed.
>
> Caveat: I must stress that this is very much ad hoc testing.
>
>
> ----------------- FieldCache test code
>
>    // Meant for testing
>    private FieldCache.StringIndex getStringIndex(
>            IndexReader reader, String field) {
>        log.info("Requesting StringIndex for field " + field);
>        Profiler profiler = new Profiler();
>        FieldCache.StringIndex stringIndex;
>        try {
>            stringIndex = FieldCache.DEFAULT.getStringIndex(reader,
> field);
>        } catch (IOException e) {
>            log.error("Could not retrieve StringIndex", e);
>            return null;
>        }
>        log.info("Got StringIndex of length " + stringIndex.order.length
>                 + " in " + profiler.getSpendTime());
>        return stringIndex;
>    }
>
> ----------------- Lucene 2.4 index
>
> ls -l index/sb/20091201-115941/lucene/
>
> -rw-rw-r-- 1 summatst summatst 12840211452 Dec  2 11:21 _0.cfx
> -rw-rw-r-- 1 summatst summatst   361027455 Dec  2 11:19 _32.cfs
> -rw-rw-r-- 1 summatst summatst   373374178 Dec  2 11:19 _65.cfs
> -rw-rw-r-- 1 summatst summatst   438076782 Dec  2 11:21 _98.cfs
> -rw-rw-r-- 1 summatst summatst   463141239 Dec  2 11:19 _cb.cfs
> -rw-rw-r-- 1 summatst summatst  1862427706 Dec  2 11:19 _rm.cfs
> -rw-rw-r-- 1 summatst summatst         203 Dec  2 11:21 segments_3
> -rw-rw-r-- 1 summatst summatst          20 Dec  2 11:18 segments.gen
>
> -----------------
>
> Regards,
> Toke Eskildsen
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message