lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: PayloadProcessorProvider Usage
Date Wed, 13 Apr 2011 18:49:49 GMT
Hmm... on option 1, how would you run into merges of target segments?
I think we currently do one big merge of the source segments, into one
segment in the target index?

But, the issue on option 2 is truly annoying.  We have the same
problem for apps that want to "ugprade" their index from 3.x to the
4.0 format (for example).

Maybe we need a new (expert) method... remergeIndex?
mergeAllSegments?  rebuildIndex?

Mike

http://blog.mikemccandless.com

On Wed, Apr 13, 2011 at 1:43 PM, Shai Erera <serera@gmail.com> wrote:
> Hey,
>
> In Lucene 3.1 we've introduced PayloadProcessorProvider which allows you to
> rewrite payloads of terms during merge. The main scenario is when you merge
> indexes, and you want to rewrite/remap payloads of the incoming indexes, but
> one can certainly use it to rewrite the payloads of a term, in a given
> index.
> When we worked on it, we thought of two ways the user can rewrite payloads
> when he merges indexes:
>
> 1) Set PPP on the target IW, call addIndexes(IndexReader), while PPP will be
> applied on the incoming directories only.
> 2) Set PPP on the source IW, call IW.optimize(), then use
> targetIW.addIndexes(Directory).
>
> The latter is better since in both cases the incoming segments are rewritten
> anyway, however in the first case you might run into merging segments of the
> target index as well, something you might want to avoid (that was the
> purpose of optimizing addIndexes(Directory)).
>
> But it turns out the latter is not so easy to achieve. If the source index
> has only 1 segment (at least in my case, ~100% of the time), then calling
> optimize() doesn't do anything because the MP thinks the index is already
> optimized and returns no MergeSpec. To overcome this, I wrote a
> ForceOptimizeMP which extends LogMP and forces optimize even if there is
> only one segment.
>
> Another option is to set the noCFSRation to 1.0 and flip the useCompoundFile
> flag (ie if source is compound, create no compound and vice versa). That can
> work too, but I don't think it's very good, because the source index will be
> changed from compound to non (or vice versa), which is something that the
> app didn't want.
>
> So I think option 1 is better, but I wanted to ask if someone knows of a
> better way to achieve this?
>
> Shai

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message