hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bob Hansen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-9103) Retry reads on DN failure
Date Thu, 05 Nov 2015 20:48:27 GMT

    [ https://issues.apache.org/jira/browse/HDFS-9103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14992443#comment-14992443

Bob Hansen commented on HDFS-9103:

"I would like to see the FileHandle::Pread method implement the retry logic internally so
we have a simple "read all this data or completely fail" method rather than forcing partial
read and retry onto our consumer. Understanding the logic that these errors mean you should
retry, but this error means that you shouldn't retry could be abstracted away as a kindness
to the consumer."
I agree with this. I think the BadDataNodeTracker should be part of the filesystem; it seems
like it complicates the API to have the user declare it. With the set<string> for exclusion
I think it was reasonable to pass in but now that it's a more complicate class that needs
to be passed it might not be a good fit for the API.

If I recall Haohui Mai wanted the passing of failed nodes to be very explicit by design. Do
you have an opinion now that I've changed how failures are tracked Haohui? I think a reasonable
middle ground might be keeping the failed DN tracking mechanism internal but providing a hook
to ask for failed datanodes that were tried during the read. Optionally passing in a pointer
to a vector of strings might work well for this.
It is a good thing to have a method where they are passed explicitly.  I would hope that because
we love our userbase, we also have a method where that is taken care of for them (both to
reduce cognitive load and errors in re-implementing code that should be done for them).  In
the HDFS-9144 refactoring, I have the easy-bake method that just passes in a buffer, size,
and offset (taken from the hdfs_cpp FileHandle API) while keeping a semi-stateless AsyncPReadSome
that takes explicit values for the active parameters (such as the dead data nodes).  I think
it's a good trade.

> Retry reads on DN failure
> -------------------------
>                 Key: HDFS-9103
>                 URL: https://issues.apache.org/jira/browse/HDFS-9103
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: hdfs-client
>            Reporter: Bob Hansen
>            Assignee: James Clampffer
>             Fix For: HDFS-8707
>         Attachments: HDFS-9103.1.patch, HDFS-9103.2.patch, HDFS-9103.HDFS-8707.006.patch,
HDFS-9103.HDFS-8707.007.patch, HDFS-9103.HDFS-8707.3.patch, HDFS-9103.HDFS-8707.4.patch, HDFS-9103.HDFS-8707.5.patch
> When AsyncPreadSome fails, add the failed DataNode to the excluded list and try again.

This message was sent by Atlassian JIRA

View raw message