lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shai Erera (JIRA)" <>
Subject [jira] Commented: (LUCENE-1585) Allow to control how payloads are merged
Date Tue, 11 May 2010 19:23:43 GMT


Shai Erera commented on LUCENE-1585:

I've reviewed the patch again, and I think setPPP should move from IWC to IW. PPP is more
of a temporary setting - if you only want to use it for addIndexes*, then you probably want
to set it just before the call, and unset it afterwards. Otherwise, unnecessary getDirPP would
be called, when you don't really care about them. So PPP is like InfoStream in a sense - usually
it'll be a point-in-time operation. You can still set it right after IW is created, if you
want to use it for other merges too.

Since IWC is a "write-once" object (documented), it doesn't make sense to set PPP whenever
you create an IW, just because at some point you know addIndexes will be called. And also,
it doesn't make sense to create a new IW instance for that purpose only. So I really feel
it should be an IW setting and not IWC.

What do you think?

> Allow to control how payloads are merged
> ----------------------------------------
>                 Key: LUCENE-1585
>                 URL:
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Index
>            Reporter: Michael Busch
>            Assignee: Shai Erera
>            Priority: Minor
>             Fix For: 3.1, 4.0
>         Attachments: LUCENE-1585_3x.patch, LUCENE-1585_3x.patch, LUCENE-1585_3x.patch,
LUCENE-1585_3x.patch, LUCENE-1585_trunk.patch
> Lucene handles backwards-compatibility of its data structures by
> converting them from the old into the new formats during segment
> merging. 
> Payloads are simply byte arrays in which users can store arbitrary
> data. Applications that use payloads might want to convert the format
> of their payloads in a similar fashion. Otherwise it's not easily
> possible to ever change the encoding of a payload without reindexing.
> So I propose to introduce a PayloadMerger class that the SegmentMerger
> invokes to merge the payloads from multiple segments. Users can then
> implement their own PayloadMerger to convert payloads from an old into
> a new format.
> In the future we need this kind of flexibility also for column-stride
> fields (LUCENE-1231) and flexible indexing codecs.
> In addition to that it would be nice if users could store version
> information in the segments file. E.g. they could store "in segment _2
> the term a:b uses payloads of format x.y".

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message