hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Colin Patrick McCabe (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HDFS-4417) HDFS-347: fix case where local reads get disabled incorrectly
Date Fri, 18 Jan 2013 23:52:13 GMT

     [ https://issues.apache.org/jira/browse/HDFS-4417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Colin Patrick McCabe updated HDFS-4417:

    Attachment: HDFS-4417.002.patch

This patch:

* adds the ability to get a closed {{DomainSocket}} with {{DomainSocket#getClosedSocket}}
(even if UNIX domain sockets are not enabled).

* fixes the JavaDoc on {{BlockReaderFactory}}-- it can NOT return null.  I thought we had
fixed this already; guess not.

* Adds the ability to search only for {{DomainPeers}} in {{PeerCache}}.  I found that it was
adequate to use a simple {{TreeMap}} rather than something fancier here.  I haven't tested
the efficiency numbers yet, but I believe they will be improved.

* {{DataXceiver}}: remove unused import.

* add {{TestParallelShortCircuitReadUnCached}}, a regression test for this JIRA.  Also add

* {{DFSInputStream}}: I found that keeping everything as a single "for" loop was hindering
readability.  I rewrote the various phases as "straight-line code" rather than trying to create
a state machine.  I'm happy to say, there is no more break and continue, or abuse of booleans.
> HDFS-347: fix case where local reads get disabled incorrectly
> -------------------------------------------------------------
>                 Key: HDFS-4417
>                 URL: https://issues.apache.org/jira/browse/HDFS-4417
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: datanode, hdfs-client, performance
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>         Attachments: HDFS-4417.002.patch, hdfs-4417.txt
> In testing HDFS-347 against HBase (thanks [~jdcryans]) we ran into the following case:
> - a workload is running which puts a bunch of local sockets in the PeerCache
> - the workload abates for a while, causing the sockets to go "stale" (ie the DN side
disconnects after the keepalive timeout)
> - the workload starts again
> In this case, the local socket retrieved from the cache failed the newBlockReader call,
and it incorrectly disabled local sockets on that host. This is similar to an earlier bug
HDFS-3376, but not quite the same.
> The next issue we ran into is that, once this happened, it never tried local sockets
again, because the cache held lots of TCP sockets. Since we always managed to get a cached
socket to the local node, it didn't bother trying local read again.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message