hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Binglin Chang (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HDFS-4273) Problem in DFSInputStream read retry logic may cause early failure
Date Mon, 06 Jan 2014 08:53:59 GMT

     [ https://issues.apache.org/jira/browse/HDFS-4273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Binglin Chang updated HDFS-4273:
--------------------------------

    Attachment: HDFS-4273.v7.patch

Update patch, chages:
1. rebase to current trunk
2. local DN in deadNodes can expire, after local DN expires, it is removed from deadNodes
3. set static const LOCAL_DEADNODE_EXPIRE_MILLISECONDS to10 minutes, so local DN should expire
in 10 minutes, then read operations will try to use this local DN is possible. Assuming fail
is fast when connecting to local DN when local DN is dead, performance impact should be small
for extra retry. 

We can make LOCAL_DEADNODE_EXPIRE_MILLISECONDS configurable by adding it to dfsclient.conf,
if someone think it necessary. 

> Problem in DFSInputStream read retry logic may cause early failure
> ------------------------------------------------------------------
>
>                 Key: HDFS-4273
>                 URL: https://issues.apache.org/jira/browse/HDFS-4273
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 2.0.2-alpha
>            Reporter: Binglin Chang
>            Assignee: Binglin Chang
>            Priority: Minor
>         Attachments: HDFS-4273-v2.patch, HDFS-4273.patch, HDFS-4273.v3.patch, HDFS-4273.v4.patch,
HDFS-4273.v5.patch, HDFS-4273.v6.patch, HDFS-4273.v7.patch, TestDFSInputStream.java
>
>
> Assume the following call logic
> {noformat} 
> readWithStrategy()
>   -> blockSeekTo()
>   -> readBuffer()
>      -> reader.doRead()
>      -> seekToNewSource() add currentNode to deadnode, wish to get a different datanode
>         -> blockSeekTo()
>            -> chooseDataNode()
>               -> block missing, clear deadNodes and pick the currentNode again
>         seekToNewSource() return false
>      readBuffer() re-throw the exception quit loop
> readWithStrategy() got the exception,  and may fail the read call before tried MaxBlockAcquireFailures.
> {noformat} 
> some issues of the logic:
> 1. seekToNewSource() logic is broken because it may clear deadNodes in the middle.
> 2. the variable "int retries=2" in readWithStrategy seems have conflict with MaxBlockAcquireFailures,
should it be removed?



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Mime
View raw message