lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (LUCENE-8264) Allow an option to rewrite all segments
Date Thu, 10 May 2018 04:37:00 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-8264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16469894#comment-16469894
] 

Erick Erickson edited comment on LUCENE-8264 at 5/10/18 4:36 AM:
-----------------------------------------------------------------

OK, we've pretty well disposed of the whole N-2 -> N upgrade issue, ain't gonna happen.
There are still two other cases where this would be useful:

1> N-1 -> N
2> adding DocValues without re-indexing

Of the two, <2> is probably the most immediately useful, I've seen a lot of clients
in the field be hurt when they realize that they'd have been better off with docValues but
didn't have them turned on.

Since I'm working on TMP, that's where I'm focusing. How to implement? A new method on MergePolicy
that no-op'd for everything except TMP? See the discussion at LUCENE-8004, but the gist is:

1> some new methods on MergePolicy that returned information from the concrete policy like
default max merge segments (don't particularly like that). Callers would have to "do the right
thing", which is trappy.

OR 

2> a new method on MergePolicy like {{findRewriteAllSegments}} that was essentially {{findForcedMerges}}
that makes some extra decisions. A pass-through for everything except TMP currently.

Or is the right thing to do here is create, say a new MergePolicy {{AddDocValuesBecaseYouDidntReadTheManualAboutWhyDocValuesWereAGoodThingMergePolicy}}?

Off the top of my head it would take (somehow) a list of fields to add DocValues to and then
"do the right thing". I don't have any details worked out yet, want to discuss before diving
in.

The requirement is that in a distributed system I can issue one command that'll fix this everywhere
I care about. I don't really have a clue how it'd deal with being applied twice in a row,
merging some segments with and some segments without etc......


was (Author: erickerickson):
See comment 9-May.

> Allow an option to rewrite all segments
> ---------------------------------------
>
>                 Key: LUCENE-8264
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8264
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Erick Erickson
>            Assignee: Erick Erickson
>            Priority: Major
>
> For the background, see SOLR-12259.
> There are several use-cases that would be much easier, especially during upgrades, if
we could specify that all segments get rewritten. 
> One example: Upgrading 5x->6x->7x. When segments are merged, they're rewritten
into the current format. However, there's no guarantee that a particular segment _ever_ gets
merged so the 6x-7x upgrade won't necessarily be successful.
> How many merge policies support this is an open question. I propose to start with TMP
and raise other JIRAs as necessary for other merge policies.
> So far the usual response has been "re-index from scratch", but that's increasingly difficult
as systems get larger.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message