lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shai Erera (JIRA)" <>
Subject [jira] Commented: (LUCENE-1585) Allow to control how payloads are merged
Date Fri, 07 May 2010 15:22:48 GMT


Shai Erera commented on LUCENE-1585:

How to handle that case is entirely up to the PPP impl. Some will return the same PP for all
terms, but maybe different ones per directory, while others will only care about few terms,
returning null for all the rest. In fact, I think the common case will be either handling
all payloads by the same PP, or handle some select terms by either one or more PPs.

As for threading, this is also something the PPP can take care of. Strangely, flex allows
stateless PPs mor easily b/c it uses BytesRef, while in 3x one needs to call both process
and payloadLength() and hence concurrency is more a problem.

I believe the common use will be few PPs that handle few terms. Of course once this is out,
people will find original uses for it :). But for now, I don't see a big perf hit...

about performance, we're checking for every position and doc whether the processor is not
null. I guess it is better than having a no-op processor? Maybe I can factor that code out
to two methods - one that always assumes there is a processor and one that doesn't?

> Allow to control how payloads are merged
> ----------------------------------------
>                 Key: LUCENE-1585
>                 URL:
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Index
>            Reporter: Michael Busch
>            Assignee: Shai Erera
>            Priority: Minor
>             Fix For: 3.1, 4.0
>         Attachments: LUCENE-1585_3x.patch, LUCENE-1585_3x.patch, LUCENE-1585_trunk.patch
> Lucene handles backwards-compatibility of its data structures by
> converting them from the old into the new formats during segment
> merging. 
> Payloads are simply byte arrays in which users can store arbitrary
> data. Applications that use payloads might want to convert the format
> of their payloads in a similar fashion. Otherwise it's not easily
> possible to ever change the encoding of a payload without reindexing.
> So I propose to introduce a PayloadMerger class that the SegmentMerger
> invokes to merge the payloads from multiple segments. Users can then
> implement their own PayloadMerger to convert payloads from an old into
> a new format.
> In the future we need this kind of flexibility also for column-stride
> fields (LUCENE-1231) and flexible indexing codecs.
> In addition to that it would be nice if users could store version
> information in the segments file. E.g. they could store "in segment _2
> the term a:b uses payloads of format x.y".

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message