Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 71F2F200C07 for ; Sun, 22 Jan 2017 08:17:32 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 709E5160B45; Sun, 22 Jan 2017 07:17:32 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id B8D08160B38 for ; Sun, 22 Jan 2017 08:17:31 +0100 (CET) Received: (qmail 15792 invoked by uid 500); 22 Jan 2017 07:17:30 -0000 Mailing-List: contact dev-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hbase.apache.org Delivered-To: mailing list dev@hbase.apache.org Received: (qmail 15781 invoked by uid 99); 22 Jan 2017 07:17:29 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 22 Jan 2017 07:17:29 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 653501806A5 for ; Sun, 22 Jan 2017 07:17:29 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -1.999 X-Spam-Level: X-Spam-Status: No, score=-1.999 tagged_above=-999 required=6.31 tests=[KAM_LAZY_DOMAIN_SECURITY=1, RP_MATCHES_RCVD=-2.999] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id vS7AkzJ-nd8y for ; Sun, 22 Jan 2017 07:17:28 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id 2B9ED5FC02 for ; Sun, 22 Jan 2017 07:17:28 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 76854E034F for ; Sun, 22 Jan 2017 07:17:27 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 680B825288 for ; Sun, 22 Jan 2017 07:17:26 +0000 (UTC) Date: Sun, 22 Jan 2017 07:17:26 +0000 (UTC) From: "Allan Yang (JIRA)" To: dev@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Created] (HBASE-17506) started mvcc transaction is not completed in branch-1 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Sun, 22 Jan 2017 07:17:32 -0000 Allan Yang created HBASE-17506: ---------------------------------- Summary: started mvcc transaction is not completed in branch-1 Key: HBASE-17506 URL: https://issues.apache.org/jira/browse/HBASE-17506 Project: HBase Issue Type: Bug Affects Versions: 1.4.0 Reporter: Allan Yang Assignee: Allan Yang In {{doMiniBatchMutation}}, if it is in replay and if the the nonce of the mutation is different, we append them to a different wal. But, after HBASE-14465, we start a mvcc transition in the ringbuffer's append thread. So, every time we append a wal entry, we started a mvcc transition, but we didn't complete the mvcc transition anywhere. This can block other transition of this region. {code} // txid should always increase, so having the one from the last call is ok. // we use HLogKey here instead of WALKey directly to support legacy coprocessors. walKey = new ReplayHLogKey(this.getRegionInfo().getEncodedNameAsBytes(), this.htableDescriptor.getTableName(), now, m.getClusterIds(), currentNonceGroup, currentNonce, mvcc); txid = this.wal.append(this.htableDescriptor, this.getRegionInfo(), walKey, walEdit, true); walEdit = new WALEdit(cellCount, isInReplay); walKey = null; {code} Looked at master branch, there is no such problem. It has a method named{{appendCurrentNonces}} : {code} private void appendCurrentNonces(final Mutation mutation, final boolean replay, final WALEdit walEdit, final long now, final long currentNonceGroup, final long currentNonce) throws IOException { if (walEdit.isEmpty()) return; if (!replay) throw new IOException("Multiple nonces per batch and not in replay"); WALKey walKey = new WALKey(this.getRegionInfo().getEncodedNameAsBytes(), this.htableDescriptor.getTableName(), now, mutation.getClusterIds(), currentNonceGroup, currentNonce, mvcc, this.getReplicationScope()); this.wal.append(this.getRegionInfo(), walKey, walEdit, true); // Complete the mvcc transaction started down in append else it will block others this.mvcc.complete(walKey.getWriteEntry()); } {code} Yes, the easiest way to fix branch-1 is to complete the writeEntry like master branch do. But is it really fine to do this? 1. Question 1: complete the mvcc transition before waiting sync will create a disturbance of data visibility. 2.Question 2: In what circumstance will there be different nonce and nonce group in a single wal entry? Nonce are used in append/increment. But in {{batchMuate}} ,we treat them differently and append one wal entry for each of them. So I think no test can reach this code path, that maybe why no one has found this bug(Please tell me if I'm wrong). -- This message was sent by Atlassian JIRA (v6.3.4#6332)