lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <>
Subject Re: Exploiting a whole lot of memory
Date Tue, 08 Oct 2013 21:50:01 GMT

It stores all terms + postings as simple java arrays, uncompressed.

Mike McCandless

On Tue, Oct 8, 2013 at 5:45 PM, Benson Margulies <> wrote:
> Consider a Lucene index consisting of 10m documents with a total disk
> footprint of 3G. Consider an application that treats this index as
> read-only, and runs very complex queries over it. Queries with many terms,
> some of them 'fuzzy' and 'should' terms and a dismax. And, finally,
> consider doing all this on a box with over 100G of physical memory, some
> cores, and nothing else to do with its time.
> I should probably just stop here and see what thoughts come back, but I'll
> go out on a limb and type the word 'codec'. The MMapDirectory, of course,
> cheerfully gets to keep every single bit in memory. And then each query
> runs, exercising the  the codec, building up a flurry of Java objects, all
> of which turn into garbage and we start all over. So, I find myself
> wondering, is there some sort of an opportunity for a codec-that-caches in
> here? In other words, I'd like to sell some of my space to buy some time.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message