hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doug Cutting (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-1707) DFS client can allow user to write data to the next block while uploading previous block to HDFS
Date Wed, 10 Oct 2007 21:53:50 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12533885

Doug Cutting commented on HADOOP-1707:

bq. If a datanode fails to write a buffer to its disk, it is reported back to the client.
The client removes this datanode from the pipeline and continues to write to the remaining
two datanodes. [ ... ] When the file is closed, the under-replicated blocks will be replicated
by the namenode.

I think the more typical failure mode will be a timeout.  I'm also still not sure of the answer
to my question: if the first datanode in the pipeline times out, does the write fail, throwing
an exception to the client?  Or does the client route around the first datanode in the pipeline
and continue until all datanodes in the pipeline time out?  If so, how can it be sure that
the other datanodes have received their copies of prior chunks from the first datanode in
the pipeline?

Also, HADOOP-1927 states that we should fail as soon as any element in the pipeline fails.
 Do you agree?  Currently this would be invisible to clients, since the entire block can be
replayed to a new pipeline.  But, without a local file, this would force us to fail the write
when any element of the pipeline fails.  Thoughts?

> DFS client can allow user to write data to the next block while uploading previous block
> ------------------------------------------------------------------------------------------------
>                 Key: HADOOP-1707
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1707
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
> The DFS client currently uses a staging file on local disk to cache all user-writes to
a file. When the staging file accumulates 1 block worth of data, its contents are flushed
to a HDFS datanode. These operations occur sequentially.
> A simple optimization of allowing the user to write to another staging file while simultaneously
uploading the contents of the first staging file to HDFS will improve file-upload performance.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message