hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "nkeywal (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-3705) Add the possibility to mark a node as 'low priority' for read in the DFSClient
Date Fri, 07 Sep 2012 18:45:08 GMT

    [ https://issues.apache.org/jira/browse/HDFS-3705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13450884#comment-13450884
] 

nkeywal commented on HDFS-3705:
-------------------------------

Yes, exactly, for all your points. Latency is key, so the sooner the failure is detected the
better it is. 
                
> Add the possibility to mark a node as 'low priority' for read in the DFSClient
> ------------------------------------------------------------------------------
>
>                 Key: HDFS-3705
>                 URL: https://issues.apache.org/jira/browse/HDFS-3705
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: hdfs client
>    Affects Versions: 1.0.3, 2.0.0-alpha, 3.0.0
>            Reporter: nkeywal
>             Fix For: 3.0.0
>
>         Attachments: hdfs-3705.sample.patch, HDFS-3705.v1.patch
>
>
> This has been partly discussed in HBASE-6435.
> The DFSClient includes a 'bad nodes' management for reads and writes. Sometimes, the
client application already know that some deads are dead or likely to be dead.
> An example is the 'HBase Write-Ahead-Log': when HBase reads this file, it knows that
the HBase regionserver died, and it's very likely that the box died so the datanode on the
same box is dead as well. This is actually critical, because:
> - it's the hbase recovery that reads these log files
> - if we read them it means that we lost a box, so we have 1 dead replica out the the
3. 
> - for all files read, we have 33% of chance to go to the dead datanode
> - as the box just died, we're very likely to get a timeout exception so we're delaying
the hbase recovery by 1 minute. For HBase, it means that the data is not available during
this minute.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message