hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Colin Patrick McCabe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-7915) The DataNode can sometimes allocate a ShortCircuitShm slot and fail to tell the DFSClient about it because of a network error
Date Thu, 12 Mar 2015 20:51:38 GMT

    [ https://issues.apache.org/jira/browse/HDFS-7915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14359377#comment-14359377
] 

Colin Patrick McCabe commented on HDFS-7915:
--------------------------------------------

bq. cnauroth asked: Thanks for the patch, Colin. The change looks good. In the test, is the
Visitor indirection necessary, or would it be easier to add 2 VisibleForTesting getters that
return the segments and slots directly to the test code?

The problem is locking.  If there is a getter for these hash tables, is the caller going to
take the appropriate locks when accessing them?  If not, we get findbugs warnings and possibly
actual test bugs.  If so, it adds a lot of coupling between the unit test and the registry
code.  In contrast, the visitor interface lets the unit test see a single consistent snapshot
of what is going on in the {{ShortCircuitRegistry}}.

> The DataNode can sometimes allocate a ShortCircuitShm slot and fail to tell the DFSClient
about it because of a network error
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-7915
>                 URL: https://issues.apache.org/jira/browse/HDFS-7915
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 2.7.0
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
>         Attachments: HDFS-7915.001.patch, HDFS-7915.002.patch, HDFS-7915.004.patch
>
>
> The DataNode can sometimes allocate a ShortCircuitShm slot and fail to tell the DFSClient
about it because of a network error.  In {{DataXceiver#requestShortCircuitFds}}, the DataNode
can succeed at the first part (mark the slot as used) and fail at the second part (tell the
DFSClient what it did). The "try" block for unregistering the slot only covers a failure in
the first part, not the second part. In this way, a divergence can form between the views
of which slots are allocated on DFSClient and on server.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message