hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Binglin Chang (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HDFS-4273) Fix some issue in DFSInputstream
Date Mon, 06 Jan 2014 09:06:56 GMT

     [ https://issues.apache.org/jira/browse/HDFS-4273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Binglin Chang updated HDFS-4273:
--------------------------------

    Description: 
Follow issues in DFSInputStream is address in this jira:
1. read may not retry enough in some cases cause early failure
Assume the following call logic
{noformat} 
readWithStrategy()
  -> blockSeekTo()
  -> readBuffer()
     -> reader.doRead()
     -> seekToNewSource() add currentNode to deadnode, wish to get a different datanode
        -> blockSeekTo()
           -> chooseDataNode()
              -> block missing, clear deadNodes and pick the currentNode again
        seekToNewSource() return false
     readBuffer() re-throw the exception quit loop
readWithStrategy() got the exception,  and may fail the read call before tried MaxBlockAcquireFailures.
{noformat} 

2. In multi-threaded scenario(like hbase), DFSInputStream.failures has race condition, it
cleared to 0 when it is still used by other thread. So it is possible that  some read thread
may never quit.

3. If local datanode is added to deadNodes, it will not be removed from deadNodes if DN is
back alive. We need a way to remove local datanode from deadNodes when the local datanode
is become live.

  was:
Follow issues in DFSInputStream is address in this jira:
1. read may not retry enough in some cases cause early failure
Assume the following call logic
{noformat} 
readWithStrategy()
  -> blockSeekTo()
  -> readBuffer()
     -> reader.doRead()
     -> seekToNewSource() add currentNode to deadnode, wish to get a different datanode
        -> blockSeekTo()
           -> chooseDataNode()
              -> block missing, clear deadNodes and pick the currentNode again
        seekToNewSource() return false
     readBuffer() re-throw the exception quit loop
readWithStrategy() got the exception,  and may fail the read call before tried MaxBlockAcquireFailures.
{noformat} 

2. In multi-threaded scenario(like hbase), DFSInputStream.failures has race condition, it
cleared to 0 when it is still used by other thread. So it is possible that  some read thread
may never quit.

3. 


> Fix some issue in DFSInputstream
> --------------------------------
>
>                 Key: HDFS-4273
>                 URL: https://issues.apache.org/jira/browse/HDFS-4273
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 2.0.2-alpha
>            Reporter: Binglin Chang
>            Assignee: Binglin Chang
>            Priority: Minor
>         Attachments: HDFS-4273-v2.patch, HDFS-4273.patch, HDFS-4273.v3.patch, HDFS-4273.v4.patch,
HDFS-4273.v5.patch, HDFS-4273.v6.patch, HDFS-4273.v7.patch, TestDFSInputStream.java
>
>
> Follow issues in DFSInputStream is address in this jira:
> 1. read may not retry enough in some cases cause early failure
> Assume the following call logic
> {noformat} 
> readWithStrategy()
>   -> blockSeekTo()
>   -> readBuffer()
>      -> reader.doRead()
>      -> seekToNewSource() add currentNode to deadnode, wish to get a different datanode
>         -> blockSeekTo()
>            -> chooseDataNode()
>               -> block missing, clear deadNodes and pick the currentNode again
>         seekToNewSource() return false
>      readBuffer() re-throw the exception quit loop
> readWithStrategy() got the exception,  and may fail the read call before tried MaxBlockAcquireFailures.
> {noformat} 
> 2. In multi-threaded scenario(like hbase), DFSInputStream.failures has race condition,
it cleared to 0 when it is still used by other thread. So it is possible that  some read thread
may never quit.
> 3. If local datanode is added to deadNodes, it will not be removed from deadNodes if
DN is back alive. We need a way to remove local datanode from deadNodes when the local datanode
is become live.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Mime
View raw message