lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benson Margulies <ben...@basistech.com>
Subject Re: Exploiting a whole lot of memory
Date Wed, 09 Oct 2013 23:13:55 GMT
On Tue, Oct 8, 2013 at 5:50 PM, Michael McCandless <
lucene@mikemccandless.com> wrote:

> DirectPostingsFormat?
>
> It stores all terms + postings as simple java arrays, uncompressed.
>

This definitely speeded things up in my benchmark, but I'm greedy for more.
 I just made a codec that returns it as the postings guy, is that the whole
recipe?. Does it make sense to extend it any further to any of the other
codec pieces?

>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Tue, Oct 8, 2013 at 5:45 PM, Benson Margulies <benson@basistech.com>
> wrote:
> > Consider a Lucene index consisting of 10m documents with a total disk
> > footprint of 3G. Consider an application that treats this index as
> > read-only, and runs very complex queries over it. Queries with many
> terms,
> > some of them 'fuzzy' and 'should' terms and a dismax. And, finally,
> > consider doing all this on a box with over 100G of physical memory, some
> > cores, and nothing else to do with its time.
> >
> > I should probably just stop here and see what thoughts come back, but
> I'll
> > go out on a limb and type the word 'codec'. The MMapDirectory, of course,
> > cheerfully gets to keep every single bit in memory. And then each query
> > runs, exercising the  the codec, building up a flurry of Java objects,
> all
> > of which turn into garbage and we start all over. So, I find myself
> > wondering, is there some sort of an opportunity for a codec-that-caches
> in
> > here? In other words, I'd like to sell some of my space to buy some time.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message