lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shai Erera (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-1585) Allow to control how payloads are merged
Date Fri, 07 May 2010 11:31:47 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-1585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12865113#action_12865113
] 

Shai Erera commented on LUCENE-1585:
------------------------------------

I hate it when it happens, but better sooner than later - I realized the API must take into
account the current Term. We cannot process all the payloads in the index the same way. So
how about the following:
* PayloadProcessorProvider will accept both a Directory and a Term, and will return a suitable
PayloadProcessor for that Directory, and if needed, for the Directory+Term combination.
* PayloadProcessor will continue to work as is and will expose the same API - a payload is
still a payload. Its the responsibility of PPP to return the right PP instance for the given
Dir+Term
It does not make sense that the payloads of all the terms in the incoming indexes will need
to be processed. Specifically, the scenario I have at hand needs to rewrite payloads of certain
postings only, but the index contains payloads in other postings as well.

For 3x that's easy - SMI holds the current Term that is processed. But I don't see an equivalent
in trunk, in PostingsConsumer. It receives a DocsEnum which does not tell you the term it
works on, and MergeState which includes just FieldInfo, which can tell you the field name?
Any ideas how I can get the Term this posting belongs to? (I know there is no Term, but field
+ BytesRef will do).

Mike - I'll add PP as a required arg to SM, np. I was only suggesting to pass IW so that we
can avoid changing it in the future, but explicit args are fine by me.

> Allow to control how payloads are merged
> ----------------------------------------
>
>                 Key: LUCENE-1585
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1585
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Index
>            Reporter: Michael Busch
>            Assignee: Shai Erera
>            Priority: Minor
>             Fix For: 3.1, 4.0
>
>         Attachments: LUCENE-1585_3x.patch, LUCENE-1585_3x.patch, LUCENE-1585_trunk.patch
>
>
> Lucene handles backwards-compatibility of its data structures by
> converting them from the old into the new formats during segment
> merging. 
> Payloads are simply byte arrays in which users can store arbitrary
> data. Applications that use payloads might want to convert the format
> of their payloads in a similar fashion. Otherwise it's not easily
> possible to ever change the encoding of a payload without reindexing.
> So I propose to introduce a PayloadMerger class that the SegmentMerger
> invokes to merge the payloads from multiple segments. Users can then
> implement their own PayloadMerger to convert payloads from an old into
> a new format.
> In the future we need this kind of flexibility also for column-stride
> fields (LUCENE-1231) and flexible indexing codecs.
> In addition to that it would be nice if users could store version
> information in the segments file. E.g. they could store "in segment _2
> the term a:b uses payloads of format x.y".

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message