lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis.gospodne...@gmail.com>
Subject Re: MergePolicy for append-only indices?
Date Tue, 28 Jan 2014 14:59:52 GMT
Thanks Mike(s) & Co.
Added https://issues.apache.org/jira/browse/LUCENE-5419

Sounds like a killer feature :)

Otis



On Wed, Jan 8, 2014 at 4:17 AM, Michael McCandless <
lucene@mikemccandless.com> wrote:

> On Mon, Jan 6, 2014 at 3:42 PM, Michael Sokolov
> <msokolov@safaribooksonline.com> wrote:
> > I think the key optimization when there are no deletions is that you
> don't
> > need to renumber documents and can bulk-copy blocks of contiguous
> documents,
> > and that is independent of merge policy. I think :)
>
> Merging of term vectors and stored fields will always use bulk-copy
> for contiguous chunks of non-deleted docs, so for the append-only case
> these will be the max chunk size and be efficient.
>
> We have no codec that implements bulk merging for postings, which
> would be interesting to pursue: in the append-only case it's possible,
> and merging of postings is normally by far the most time consuming
> step of a merge.
>
> Also, no RAM will be used holding the doc mapping, since the docIDs
> don't change.
>
> These benefits are independent of the MergePolicy.
>
> I think TieredMergePolicy will work fine for append-only; I'm not sure
> how you'd improve on its approach.  It will in general renumber the
> docs, so if that's a problem, apps should use LogByteSizeMP.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message