Return-Path: X-Original-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id DA4B11740C for ; Thu, 12 Mar 2015 20:51:38 +0000 (UTC) Received: (qmail 9855 invoked by uid 500); 12 Mar 2015 20:51:38 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 9798 invoked by uid 500); 12 Mar 2015 20:51:38 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 9786 invoked by uid 99); 12 Mar 2015 20:51:38 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 12 Mar 2015 20:51:38 +0000 Date: Thu, 12 Mar 2015 20:51:38 +0000 (UTC) From: "Colin Patrick McCabe (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HDFS-7915) The DataNode can sometimes allocate a ShortCircuitShm slot and fail to tell the DFSClient about it because of a network error MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HDFS-7915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14359377#comment-14359377 ] Colin Patrick McCabe commented on HDFS-7915: -------------------------------------------- bq. cnauroth asked: Thanks for the patch, Colin. The change looks good. In the test, is the Visitor indirection necessary, or would it be easier to add 2 VisibleForTesting getters that return the segments and slots directly to the test code? The problem is locking. If there is a getter for these hash tables, is the caller going to take the appropriate locks when accessing them? If not, we get findbugs warnings and possibly actual test bugs. If so, it adds a lot of coupling between the unit test and the registry code. In contrast, the visitor interface lets the unit test see a single consistent snapshot of what is going on in the {{ShortCircuitRegistry}}. > The DataNode can sometimes allocate a ShortCircuitShm slot and fail to tell the DFSClient about it because of a network error > ----------------------------------------------------------------------------------------------------------------------------- > > Key: HDFS-7915 > URL: https://issues.apache.org/jira/browse/HDFS-7915 > Project: Hadoop HDFS > Issue Type: Bug > Affects Versions: 2.7.0 > Reporter: Colin Patrick McCabe > Assignee: Colin Patrick McCabe > Attachments: HDFS-7915.001.patch, HDFS-7915.002.patch, HDFS-7915.004.patch > > > The DataNode can sometimes allocate a ShortCircuitShm slot and fail to tell the DFSClient about it because of a network error. In {{DataXceiver#requestShortCircuitFds}}, the DataNode can succeed at the first part (mark the slot as used) and fail at the second part (tell the DFSClient what it did). The "try" block for unregistering the slot only covers a failure in the first part, not the second part. In this way, a divergence can form between the views of which slots are allocated on DFSClient and on server. -- This message was sent by Atlassian JIRA (v6.3.4#6332)