lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yonik Seeley" <yo...@apache.org>
Subject Re: improve how IndexWriter uses RAM to buffer added documents
Date Mon, 30 Apr 2007 13:18:50 GMT
On 4/30/07, Michael McCandless (JIRA) <jira@apache.org> wrote:
> After discussion on java-dev last time, I decided to retry the
> "persistent hash" approach, where the Postings hash lasts across many
> docs and then a single flush produces a partial segment containing all
> of those docs.  This is in contrast to the previous approach where
> each doc makes its own segment and then they are merged.
>
> It turns out this is even faster than my previous approach,

Go, Mike, go!

> With this new approach, as I process each term in the document I
> immediately write the prox/freq in their compact (vints) format into
> shared byte[] buffers, rather than accumulating int[] arrays that then
> need to be re-processed into the vint encoding.  This speeds things up
> because we don't double-process the postings.

Good idea!

>  It also uses less
> per-document RAM overhead because intermediate postings are stored as
> vints not as ints.

I'm just trying to follow along at a high level...how do you handle
intermediate termdocs?

-Yonik

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message