lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shai Erera (JIRA)" <>
Subject [jira] Commented: (LUCENE-1585) Allow to control how payloads are merged
Date Fri, 07 May 2010 11:31:47 GMT


Shai Erera commented on LUCENE-1585:

I hate it when it happens, but better sooner than later - I realized the API must take into
account the current Term. We cannot process all the payloads in the index the same way. So
how about the following:
* PayloadProcessorProvider will accept both a Directory and a Term, and will return a suitable
PayloadProcessor for that Directory, and if needed, for the Directory+Term combination.
* PayloadProcessor will continue to work as is and will expose the same API - a payload is
still a payload. Its the responsibility of PPP to return the right PP instance for the given
It does not make sense that the payloads of all the terms in the incoming indexes will need
to be processed. Specifically, the scenario I have at hand needs to rewrite payloads of certain
postings only, but the index contains payloads in other postings as well.

For 3x that's easy - SMI holds the current Term that is processed. But I don't see an equivalent
in trunk, in PostingsConsumer. It receives a DocsEnum which does not tell you the term it
works on, and MergeState which includes just FieldInfo, which can tell you the field name?
Any ideas how I can get the Term this posting belongs to? (I know there is no Term, but field
+ BytesRef will do).

Mike - I'll add PP as a required arg to SM, np. I was only suggesting to pass IW so that we
can avoid changing it in the future, but explicit args are fine by me.

> Allow to control how payloads are merged
> ----------------------------------------
>                 Key: LUCENE-1585
>                 URL:
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Index
>            Reporter: Michael Busch
>            Assignee: Shai Erera
>            Priority: Minor
>             Fix For: 3.1, 4.0
>         Attachments: LUCENE-1585_3x.patch, LUCENE-1585_3x.patch, LUCENE-1585_trunk.patch
> Lucene handles backwards-compatibility of its data structures by
> converting them from the old into the new formats during segment
> merging. 
> Payloads are simply byte arrays in which users can store arbitrary
> data. Applications that use payloads might want to convert the format
> of their payloads in a similar fashion. Otherwise it's not easily
> possible to ever change the encoding of a payload without reindexing.
> So I propose to introduce a PayloadMerger class that the SegmentMerger
> invokes to merge the payloads from multiple segments. Users can then
> implement their own PayloadMerger to convert payloads from an old into
> a new format.
> In the future we need this kind of flexibility also for column-stride
> fields (LUCENE-1231) and flexible indexing codecs.
> In addition to that it would be nice if users could store version
> information in the segments file. E.g. they could store "in segment _2
> the term a:b uses payloads of format x.y".

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message