hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doug Cutting (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-1707) DFS client can allow user to write data to the next block while uploading previous block to HDFS
Date Thu, 11 Oct 2007 17:09:51 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12534092

Doug Cutting commented on HADOOP-1707:

> Another option would be to say that the application gets an error if the Primary datanode
fails. Do you think that this is acceptable?

Perhaps, if it only happens rarely.  If, e.g., sorts generally complete on 900 nodes with
no such failures, then this is probably acceptable.  If the primary datanode is localhost,
and if secondary failures are survivable, then this may work well enough.

Otherwise, how do we recover when a datanode in the pipeline becomes unreachable?  Will we
use per-buffer acks?  The primary datanode won't ack a buffer until all datanodes in the pipeline
have it?  Then if one datanode fails, we could route around it, initialize its copy of the
block from one of the survivors, and continue.  The acking will effectively add flow-control,
which could be a feature, or could slow things.  Datanodes may receive the same buffer twice,
so buffers will need revision numbers or somesuch. 

> DFS client can allow user to write data to the next block while uploading previous block
> ------------------------------------------------------------------------------------------------
>                 Key: HADOOP-1707
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1707
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
> The DFS client currently uses a staging file on local disk to cache all user-writes to
a file. When the staging file accumulates 1 block worth of data, its contents are flushed
to a HDFS datanode. These operations occur sequentially.
> A simple optimization of allowing the user to write to another staging file while simultaneously
uploading the contents of the first staging file to HDFS will improve file-upload performance.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message