lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shai Erera (JIRA)" <>
Subject [jira] Commented: (LUCENE-1585) Allow to control how payloads are merged
Date Sat, 08 May 2010 17:50:47 GMT


Shai Erera commented on LUCENE-1585:

I've been thinking about the multi-threading issue, and as far as I understand, it only concerns
the local segment merging? PPP works w/ Directory+Term because the format of the payloads
is per term for the entire Directory (not per segment). Therefore, I don't think there is
multi-threading issues with the external Directories (the result of addIndexe*)?

For the local segments, I see what you mean - it is possible that several threads will ask
a PP for the same Dir+Term. PPP implementations can still work well in such scenario (if they
wish to process payloads of local Dir as well) by holding a ThreadLocal PP for Dir+Term combination?
I think proper documentation should be enough in this case. The whole point of this issue
is to allow better control when addIndexes* are used. Affecting local payloads is a nice bonus,
and I think we should wait for a real scenario which takes advantage of that. If the threading
documentation warnings won't help, we can discuss then how to solve it?

> Allow to control how payloads are merged
> ----------------------------------------
>                 Key: LUCENE-1585
>                 URL:
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Index
>            Reporter: Michael Busch
>            Assignee: Shai Erera
>            Priority: Minor
>             Fix For: 3.1, 4.0
>         Attachments: LUCENE-1585_3x.patch, LUCENE-1585_3x.patch, LUCENE-1585_trunk.patch
> Lucene handles backwards-compatibility of its data structures by
> converting them from the old into the new formats during segment
> merging. 
> Payloads are simply byte arrays in which users can store arbitrary
> data. Applications that use payloads might want to convert the format
> of their payloads in a similar fashion. Otherwise it's not easily
> possible to ever change the encoding of a payload without reindexing.
> So I propose to introduce a PayloadMerger class that the SegmentMerger
> invokes to merge the payloads from multiple segments. Users can then
> implement their own PayloadMerger to convert payloads from an old into
> a new format.
> In the future we need this kind of flexibility also for column-stride
> fields (LUCENE-1231) and flexible indexing codecs.
> In addition to that it would be nice if users could store version
> information in the segments file. E.g. they could store "in segment _2
> the term a:b uses payloads of format x.y".

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message