lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <>
Subject [jira] Commented: (LUCENE-1585) Allow to control how payloads are merged
Date Mon, 10 May 2010 09:02:51 GMT


Michael McCandless commented on LUCENE-1585:

Make sure you fix the whitespace -- some indents are now tabs or 8
spaces, but should be 2.

bq. I believe the common use will be few PPs that handle few terms.

Or, maybe even more common will be per-Directory switching and
ignoring the Term?  EG if I changed my payload format (for all terms)
at some point...

Though we don't have great support for versioning of payloads during
searching... eg PayloadTermQuery doesn't make it simple to figure out
which Dir you are now searching...

My only concern w/ this API is that it has a built-in unnecessary
global perf/synchronization cost, by design: I'll have to use a sync'd
map or a thread local to implement that method.  Even if my app
ignores the Term, I'll need to sync.  This sync is global -- all
merges running concurrently, per Term, will share a single global

But it's only the Dir lookup that requires sync.

So if, instead, the Dir lookup and the Term lookup were separate
method calls, I'd only need sync on the Dir lookup (called very rarely
often -- once per segment on the start of the merge).  The Term
lookup, called far far more often, is guaranteed to be thread private
so it'd need no sync.

I guess in practice the sync cost may not be such a big deal?  So
maybe we could commit w/ this approach (it is experimental), even with
this limitation?  It's just that I don't like adding APIs which make
our concurrency worse... we are supposed to be moving in the other
direction :)

> Allow to control how payloads are merged
> ----------------------------------------
>                 Key: LUCENE-1585
>                 URL:
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Index
>            Reporter: Michael Busch
>            Assignee: Shai Erera
>            Priority: Minor
>             Fix For: 3.1, 4.0
>         Attachments: LUCENE-1585_3x.patch, LUCENE-1585_3x.patch, LUCENE-1585_3x.patch,
> Lucene handles backwards-compatibility of its data structures by
> converting them from the old into the new formats during segment
> merging. 
> Payloads are simply byte arrays in which users can store arbitrary
> data. Applications that use payloads might want to convert the format
> of their payloads in a similar fashion. Otherwise it's not easily
> possible to ever change the encoding of a payload without reindexing.
> So I propose to introduce a PayloadMerger class that the SegmentMerger
> invokes to merge the payloads from multiple segments. Users can then
> implement their own PayloadMerger to convert payloads from an old into
> a new format.
> In the future we need this kind of flexibility also for column-stride
> fields (LUCENE-1231) and flexible indexing codecs.
> In addition to that it would be nice if users could store version
> information in the segments file. E.g. they could store "in segment _2
> the term a:b uses payloads of format x.y".

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message