hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Liang Xie (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HDFS-6574) make sure addToDeadNodes() only be called when there's a connection issue occur
Date Fri, 20 Jun 2014 08:42:24 GMT
Liang Xie created HDFS-6574:

             Summary: make sure addToDeadNodes() only be called when there's a connection
issue occur
                 Key: HDFS-6574
                 URL: https://issues.apache.org/jira/browse/HDFS-6574
             Project: Hadoop HDFS
          Issue Type: Improvement
          Components: hdfs-client
    Affects Versions: 3.0.0, 2.5.0
            Reporter: Liang Xie
            Assignee: Liang Xie

My colleague [~cuijianwei] found in a HBase testing scenario, once a bad disk occured, the
local read will be skipped and lots of remote read be requested for a lengthy time, say, tens
of minutes, then we had to trigger a compaction to help recovering the locality and read latency.
It turned out relating with the addToDeadNodes(), imaging one disk in local node has something
wrong, current impl will add the local node to the dead node list, then all other good disks
in local node could not service any read request.
So better chooses here to me, seems:
1) tell the detail IOException really is a connection related exception, then call addToDeadNodes().
2) tell the IOException is related with bad block/disk, w/o call addToDeadNodes(); else call

another thing need to consider is if we have got a disk exception from one node, should we
refresh the locatedBlocks info from nn to clear all rotten caching for that bad disk of the
node ?  it'll be heavy somehow if it's a huge size file...

I have a plan to make a patch for our internal hadoop branch, due to it'll degrade HBase read
performance severely, also i'd like to contribute to community if you think this proposal
is not too crazy...   [~stack]

This message was sent by Atlassian JIRA

View raw message