Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 15E7B200C88 for ; Fri, 2 Jun 2017 14:38:11 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 14B31160BD2; Fri, 2 Jun 2017 12:38:11 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 33CD3160BD1 for ; Fri, 2 Jun 2017 14:38:10 +0200 (CEST) Received: (qmail 76198 invoked by uid 500); 2 Jun 2017 12:38:09 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 76187 invoked by uid 99); 2 Jun 2017 12:38:09 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 02 Jun 2017 12:38:09 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 88AF6180314 for ; Fri, 2 Jun 2017 12:38:08 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -99.202 X-Spam-Level: X-Spam-Status: No, score=-99.202 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id pvLLATa9xjpL for ; Fri, 2 Jun 2017 12:38:06 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id 5ED475FC16 for ; Fri, 2 Jun 2017 12:38:06 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 087E9E0D50 for ; Fri, 2 Jun 2017 12:38:05 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 44C6F21B5B for ; Fri, 2 Jun 2017 12:38:04 +0000 (UTC) Date: Fri, 2 Jun 2017 12:38:04 +0000 (UTC) From: "Paulo Motta (JIRA)" To: commits@cassandra.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Comment Edited] (CASSANDRA-10130) Node failure during 2i update after streaming can have incomplete 2i when restarted MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Fri, 02 Jun 2017 12:38:11 -0000 [ https://issues.apache.org/jira/browse/CASSANDRA-10130?page=3Dcom.atla= ssian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId= =3D16034602#comment-16034602 ]=20 Paulo Motta edited comment on CASSANDRA-10130 at 6/2/17 12:38 PM: ------------------------------------------------------------------ bq. I think in such case the new Index initialization task would take care = of indexing, so it doesn't really matter if the new index misses the sstabl= e notification. So, to cover against the specific race you mentioned, we ca= n just filter out in SIM#handleNotification(), when receiving the SSTableAd= dedNotification, all indexes not marked as building, as we can assume those= missed the first notification because not yet registered (and being just r= egistered, the initialization task will eventually take care of any initial= indexing). It's still possible that an index is created after the {{SSTableAddedNotifi= cation}} but before all sstables are added to the tracker, in this case the= initial index rebuild will not index the sstables that were added and filt= ering all indexes marked as building will not re-mark the new index as buil= ding since it may be still doing the initial rebuild: 1. {{SSTableBeforeAddNotification}} is received by SIM 3. new index is created (it will not index the new sstables since they are = not in the tracker yet) and mark as building 4. New sstables are added to the tracker 5. {{SSTableAddeNotification}} is received and index rebuild of new sstable= s is triggered - the index is not re-marked as building because it's alread= y marked as building by the index creation 6. original index rebuild is finished and index is marked as built 7. index rebuild of new sstables fail but index is marked as built This case is extremely unlikely to happen but shows the fragility of this a= pproach, which may be prone to other races we're not aware of. bq.I think the race above can be solved easily without adding columns to th= e system table. Other than that, let's not forget the pendingBuilds counter= was needed to protect us not just against concurrent building of multiple = indexes, but also against concurrent building of multiple sstables "batches= " for the same index. My suggestion was not to add new columns to the system table, but rather ov= erload the existing columns with the following semantics: - When an index is created for the first time for a table add new entry {{I= NSERT INTO system.IndexInfo (table_name, index_name) VALUES ("table", "tabl= e")}} - When new sstables are received remove this entry from the index while the= se sstables are being indexed - On restart, if this entry is not present it means index rebuild failure f= or all indexes which need rebuild To protect against concurrent building of multiple sstables "batches" for t= he same index, we could still have a pendingBuilds counter but a single cou= nter for all indexes rather than one counter per index. For manual rebuilds= we could simply prevent a new index rebuild if there's one already running= , since it doesn't make much sense to do simultaneous full rebuilds of the = same indexes. In any case this is just a suggestion, I'm fine if we go with other approac= h (such as saving which indices were marked on the first notification) or d= ecide this race is too unlikely to bother doing anything. was (Author: pauloricardomg): bq. I think in such case the new Index initialization task would take care = of indexing, so it doesn't really matter if the new index misses the sstabl= e notification. So, to cover against the specific race you mentioned, we ca= n just filter out in SIM#handleNotification(), when receiving the SSTableAd= dedNotification, all indexes not marked as building, as we can assume those= missed the first notification because not yet registered (and being just r= egistered, the initialization task will eventually take care of any initial= indexing). It's still possible that an index is created after the {{SSTableAddedNotifi= cation}} but before all sstables are added to the tracker, in this case the= initial index rebuild will not index the sstables that were added and filt= ering all indexes marked as building will not re-mark the new index as buil= ding since it may be still doing the initial rebuild: 1. {{SSTableBeforeAddNotification}} is received by SIM 3. new index is created (it will not index the new sstables since they are = not in the tracker yet) and mark as building 4. New sstables are added to the tracker 5. {{SSTableAddeNotification}} is received and index rebuild of new sstable= s is triggered - the index is not re-marked as building because it's alread= y marked as building by the index creation 6. original index rebuild is finished and index is marked as built 7. index rebuild of new sstables fail but index is marked as built This case is extremely unlikely to happen but shows the fragility of this a= pproach, which may be prone to other races we're not aware of. bq.I think the race above can be solved easily without adding columns to th= e system table. Other than that, let's not forget the pendingBuilds counter= was needed to protect us not just against concurrent building of multiple = indexes, but also against concurrent building of multiple sstables "batches= " for the same index. My suggestion was not to add new columns to the system table, but rather ov= erload the existing columns with the following semantics: - When an index is created add new entry {{INSERT INTO system.IndexInfo (ta= ble_name, index_name) VALUES ("table", "table")}} - When new sstables are received remove this entry from the index while the= se sstables are being indexed - On restart, if this entry is not present it means index rebuild failure f= or all indexes which need rebuild To protect against concurrent building of multiple sstables "batches" for t= he same index, we could still have a pendingBuilds counter but a single cou= nter for all indexes rather than one counter per index. For manual rebuilds= we could simply prevent a new index rebuild if there's one already running= , since it doesn't make much sense to do simultaneous full rebuilds of the = same indexes. In any case this is just a suggestion, I'm fine if we go with other approac= h (such as saving which indices were marked on the first notification) or d= ecide this race is too unlikely to bother doing anything. > Node failure during 2i update after streaming can have incomplete 2i when= restarted > -------------------------------------------------------------------------= ---------- > > Key: CASSANDRA-10130 > URL: https://issues.apache.org/jira/browse/CASSANDRA-1013= 0 > Project: Cassandra > Issue Type: Bug > Components: Coordination > Reporter: Yuki Morishita > Assignee: Andr=C3=A9s de la Pe=C3=B1a > Priority: Minor > > Since MV/2i update happens after SSTables are received, node failure duri= ng MV/2i update can leave received SSTables live when restarted while MV/2i= are partially up to date. > We can add some kind of tracking mechanism to automatically rebuild at th= e startup, or at least warn user when the node restarts. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org For additional commands, e-mail: commits-help@cassandra.apache.org