Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 186A1200C02 for ; Fri, 6 Jan 2017 05:17:01 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 16F83160B42; Fri, 6 Jan 2017 04:17:01 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 61866160B33 for ; Fri, 6 Jan 2017 05:17:00 +0100 (CET) Received: (qmail 83404 invoked by uid 500); 6 Jan 2017 04:16:58 -0000 Mailing-List: contact dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@lucene.apache.org Delivered-To: mailing list dev@lucene.apache.org Received: (qmail 83392 invoked by uid 99); 6 Jan 2017 04:16:58 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 06 Jan 2017 04:16:58 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 577652C0453 for ; Fri, 6 Jan 2017 04:16:58 +0000 (UTC) Date: Fri, 6 Jan 2017 04:16:58 +0000 (UTC) From: "Cao Manh Dat (JIRA)" To: dev@lucene.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Comment Edited] (SOLR-9922) Write buffering updates to another tlog MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Fri, 06 Jan 2017 04:17:01 -0000 [ https://issues.apache.org/jira/browse/SOLR-9922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15803485#comment-15803485 ] Cao Manh Dat edited comment on SOLR-9922 at 1/6/17 4:16 AM: ------------------------------------------------------------ In the current code, FLAG_GAP is used in RecoveryStrategy, we first check lastOperation have FLAG_GAP, if yes we are sure that buffering updates is not applied ( because the node failed during buffering ) so we skip peersync and go directly to replication process. In my patch, I detect this event by checking that any old buffer log exists. So I'm worried about the case when the lastOperation have FLAG_GAP when users restart the whole cluster with the new code. Instead of going to replication process, the new code will go to peerSync. was (Author: caomanhdat): In current code, FLAG_GAP is used in RecoveryStrategy, we first check lastOperation have FLAG_GAP, if yes we are sure that buffering updates is not applied ( because the node failed during buffering ) so we skip peersync and go directly to replication process. In my patch, I detect this event by checking that any old buffer log exist. So I'm worry about the case when the lastOperation have FLAG_GAP when users restart the whole cluster with new code. That the reason why I said that "all nodes should be in ACTIVE state". > Write buffering updates to another tlog > --------------------------------------- > > Key: SOLR-9922 > URL: https://issues.apache.org/jira/browse/SOLR-9922 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Reporter: Cao Manh Dat > Attachments: SOLR-9922.patch, SOLR-9922.patch, SOLR-9922.patch > > > Currently, we write buffering logs to current tlog and not apply that updates to index. Then we rely on replay log to apply that updates to index. But at the same time there are some updates also write to current tlog and applied to the index. > For example, during peersync, if new updates come to replica we will end up with this tlog > tlog : old1, new1, new2, old2, new3, old3 > old updates belong to peersync, and these updates are applied to the index. > new updates belong to buffering updates, and these updates are not applied to the index. > But writing all the updates to same current tlog make code base very complex. We should write buffering updates to another tlog file. > By doing this, it will help our code base simpler. It also makes replica recovery for SOLR-9835 more easier. Because after peersync success we can copy new updates from temporary file to current tlog, for example > tlog : old1, old2, old3 > temporary tlog : new1, new2, new3 > --> > tlog : old1, old2, old3, new1, new2, new3 -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional commands, e-mail: dev-help@lucene.apache.org