From dev-return-321818-archive-asf-public=cust-asf.ponee.io@lucene.apache.org Thu May 10 06:37:05 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id 6B4F718063A for ; Thu, 10 May 2018 06:37:05 +0200 (CEST) Received: (qmail 27125 invoked by uid 500); 10 May 2018 04:37:02 -0000 Mailing-List: contact dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@lucene.apache.org Delivered-To: mailing list dev@lucene.apache.org Received: (qmail 27115 invoked by uid 99); 10 May 2018 04:37:02 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 10 May 2018 04:37:02 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 21FC51A2071 for ; Thu, 10 May 2018 04:37:02 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -101.511 X-Spam-Level: X-Spam-Status: No, score=-101.511 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RCVD_IN_DNSWL_MED=-2.3, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id YfIHQsuf9CUr for ; Thu, 10 May 2018 04:37:01 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id D7FA25F27D for ; Thu, 10 May 2018 04:37:00 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 5F88EE00C6 for ; Thu, 10 May 2018 04:37:00 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 2357A21298 for ; Thu, 10 May 2018 04:37:00 +0000 (UTC) Date: Thu, 10 May 2018 04:37:00 +0000 (UTC) From: "Erick Erickson (JIRA)" To: dev@lucene.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Comment Edited] (LUCENE-8264) Allow an option to rewrite all segments MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/LUCENE-8264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16469894#comment-16469894 ] Erick Erickson edited comment on LUCENE-8264 at 5/10/18 4:36 AM: ----------------------------------------------------------------- OK, we've pretty well disposed of the whole N-2 -> N upgrade issue, ain't gonna happen. There are still two other cases where this would be useful: 1> N-1 -> N 2> adding DocValues without re-indexing Of the two, <2> is probably the most immediately useful, I've seen a lot of clients in the field be hurt when they realize that they'd have been better off with docValues but didn't have them turned on. Since I'm working on TMP, that's where I'm focusing. How to implement? A new method on MergePolicy that no-op'd for everything except TMP? See the discussion at LUCENE-8004, but the gist is: 1> some new methods on MergePolicy that returned information from the concrete policy like default max merge segments (don't particularly like that). Callers would have to "do the right thing", which is trappy. OR 2> a new method on MergePolicy like {{findRewriteAllSegments}} that was essentially {{findForcedMerges}} that makes some extra decisions. A pass-through for everything except TMP currently. Or is the right thing to do here is create, say a new MergePolicy {{AddDocValuesBecaseYouDidntReadTheManualAboutWhyDocValuesWereAGoodThingMergePolicy}}? Off the top of my head it would take (somehow) a list of fields to add DocValues to and then "do the right thing". I don't have any details worked out yet, want to discuss before diving in. The requirement is that in a distributed system I can issue one command that'll fix this everywhere I care about. I don't really have a clue how it'd deal with being applied twice in a row, merging some segments with and some segments without etc...... was (Author: erickerickson): See comment 9-May. > Allow an option to rewrite all segments > --------------------------------------- > > Key: LUCENE-8264 > URL: https://issues.apache.org/jira/browse/LUCENE-8264 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Erick Erickson > Assignee: Erick Erickson > Priority: Major > > For the background, see SOLR-12259. > There are several use-cases that would be much easier, especially during upgrades, if we could specify that all segments get rewritten. > One example: Upgrading 5x->6x->7x. When segments are merged, they're rewritten into the current format. However, there's no guarantee that a particular segment _ever_ gets merged so the 6x-7x upgrade won't necessarily be successful. > How many merge policies support this is an open question. I propose to start with TMP and raise other JIRAs as necessary for other merge policies. > So far the usual response has been "re-index from scratch", but that's increasingly difficult as systems get larger. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional commands, e-mail: dev-help@lucene.apache.org