hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Binglin Chang (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HDFS-4273) Problem in DFSInputStream read retry logic may cause early failure
Date Thu, 06 Dec 2012 13:24:59 GMT

     [ https://issues.apache.org/jira/browse/HDFS-4273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Binglin Chang updated HDFS-4273:

    Attachment: HDFS-4273.patch

I am referring read() as int readWithStrategy(ReaderStrategy strategy, int off, int len)
and pread() as int read(long position, byte[] buffer, int offset, int length)

1. Add new argument "dislike" to chooseDatanode() and bestNode(), so to fix seekToNewSource.
2. Make failures to be local variable, so pread can be thread-safe
2. In read(), make outer layer to handle BlockMissingException, bypassing seekToNewSource
3. Remove read retries, cause there is already MaxBlockAcquireFailures to handle retry
4. Throw ChecksumException iff we have tried enough times and there is only one replica available.
   In original logic, the throwing of ChecksumException or BlockMissing is somehow random,
depending the order of the locations of getLocatedBlocks().
   Another alternative is change it to always throw BlockMissingException(like pread behavior),
but it breaks current test cases.
5. In pread(), modify code to follow the same retry logic as read().  
   Notice that the exception behavior of read() and pread() is not same currently:
   read() sometimes throw ChecksumException, pread() never throw ChecksumException. The current
patch remain the same behavior.  
6. Add sanity checks for seek and seekToNewSource
7. Add test to check DFSInputStream tried MaxBlockAcquireFailures under error
8. Add the same test cases to check seekToNewSource as the original test cases to check seek

> Problem in DFSInputStream read retry logic may cause early failure
> ------------------------------------------------------------------
>                 Key: HDFS-4273
>                 URL: https://issues.apache.org/jira/browse/HDFS-4273
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Binglin Chang
>            Assignee: Binglin Chang
>            Priority: Minor
>         Attachments: HDFS-4273.patch, TestDFSInputStream.java
> Assume the following call logic
> {noformat} 
> readWithStrategy()
>   -> blockSeekTo()
>   -> readBuffer()
>      -> reader.doRead()
>      -> seekToNewSource() add currentNode to deadnode, wish to get a different datanode
>         -> blockSeekTo()
>            -> chooseDataNode()
>               -> block missing, clear deadNodes and pick the currentNode again
>         seekToNewSource() return false
>      readBuffer() re-throw the exception quit loop
> readWithStrategy() got the exception,  and may fail the read call before tried MaxBlockAcquireFailures.
> {noformat} 
> some issues of the logic:
> 1. seekToNewSource() logic is broken because it may clear deadNodes in the middle.
> 2. the variable "int retries=2" in readWithStrategy seems have conflict with MaxBlockAcquireFailures,
should it be removed?

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message