Return-Path: X-Original-To: apmail-hbase-dev-archive@www.apache.org Delivered-To: apmail-hbase-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A053A175C0 for ; Tue, 9 Jun 2015 22:02:02 +0000 (UTC) Received: (qmail 88451 invoked by uid 500); 9 Jun 2015 22:02:01 -0000 Delivered-To: apmail-hbase-dev-archive@hbase.apache.org Received: (qmail 88340 invoked by uid 500); 9 Jun 2015 22:02:01 -0000 Mailing-List: contact dev-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hbase.apache.org Delivered-To: mailing list dev@hbase.apache.org Received: (qmail 88061 invoked by uid 99); 9 Jun 2015 22:02:01 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 09 Jun 2015 22:02:01 +0000 Date: Tue, 9 Jun 2015 22:02:00 +0000 (UTC) From: "Enis Soztutar (JIRA)" To: dev@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Created] (HBASE-13877) Interrupt to flush from TableFlushProcedure causes dataloss in ITBLL MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 Enis Soztutar created HBASE-13877: ------------------------------------- Summary: Interrupt to flush from TableFlushProcedure causes dataloss in ITBLL Key: HBASE-13877 URL: https://issues.apache.org/jira/browse/HBASE-13877 Project: HBase Issue Type: Bug Reporter: Enis Soztutar Assignee: Enis Soztutar Priority: Blocker Fix For: 2.0.0, 1.2.0, 1.1.1 ITBLL with 1.25B rows failed for me (and Stack as reported in https://issues.apache.org/jira/browse/HBASE-13811?focusedCommentId=14577834&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14577834) HBASE-13811 and HBASE-13853 fixed an issue with WAL edit filtering. The root cause this time seems to be different. It is due to procedure based flush interrupting the flush request in case the procedure is cancelled from an exception elsewhere. This leaves the memstore snapshot intact without aborting the server. The next flush, then flushes the previous memstore with the current seqId (as opposed to seqId from the memstore snapshot). This creates an hfile with larger seqId than what its contents are. Previous behavior in 0.98 and 1.0 (I believe) is that after flush prepare and interruption / exception will cause RS abort. -- This message was sent by Atlassian JIRA (v6.3.4#6332)