lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-1585) Allow to control how payloads are merged
Date Fri, 07 May 2010 14:41:50 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-1585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12865176#action_12865176
] 

Michael McCandless commented on LUCENE-1585:
--------------------------------------------

bq. PayloadProcessorProvider will accept both a Directory and a Term, and will return a suitable
PayloadProcessor for that Directory, and if needed, for the Directory+Term combination.

OK, though this is potentially rather costly -- a huge number of terms are visited when merging.
 I guess PPP impls would reuse instances of PP?  But then how will it handle threads...? 
(Since multiple threads may be merging at once).  Maybe we need three tiers?  PPP, PP, PPperTerm,
such that the PP is used only by one thread in Lucene.  Hmm... getting hairy.

bq. Any ideas how I can get the Term this posting belongs to? (I know there is no Term, but
field + BytesRef will do).

Maybe set current field & current term in MergeState?

> Allow to control how payloads are merged
> ----------------------------------------
>
>                 Key: LUCENE-1585
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1585
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Index
>            Reporter: Michael Busch
>            Assignee: Shai Erera
>            Priority: Minor
>             Fix For: 3.1, 4.0
>
>         Attachments: LUCENE-1585_3x.patch, LUCENE-1585_3x.patch, LUCENE-1585_trunk.patch
>
>
> Lucene handles backwards-compatibility of its data structures by
> converting them from the old into the new formats during segment
> merging. 
> Payloads are simply byte arrays in which users can store arbitrary
> data. Applications that use payloads might want to convert the format
> of their payloads in a similar fashion. Otherwise it's not easily
> possible to ever change the encoding of a payload without reindexing.
> So I propose to introduce a PayloadMerger class that the SegmentMerger
> invokes to merge the payloads from multiple segments. Users can then
> implement their own PayloadMerger to convert payloads from an old into
> a new format.
> In the future we need this kind of flexibility also for column-stride
> fields (LUCENE-1231) and flexible indexing codecs.
> In addition to that it would be nice if users could store version
> information in the segments file. E.g. they could store "in segment _2
> the term a:b uses payloads of format x.y".

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message