hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doug Cutting (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-1707) Remove the DFS Client disk-based cache
Date Mon, 15 Oct 2007 19:47:50 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12534952
] 

Doug Cutting commented on HADOOP-1707:
--------------------------------------

> The client gets an exception if the primary datanode fails.

Why can't it simply replace the primary with one of the secondary datanodes and proceed?

> If a secondary datanode fails, the primary informs the client about this event.

Since a secondary will typically fail by timing out, the timeout used between the client and
the primary must be longer than that used between the primary and secondary, so that the client
waits long enough to hear about a failed secondary.  And the timeout used between the application
and the client must be longer yet.  Right?  Perhaps we should make all these timeouts proportional
to a single configuration parameter, the application timeout?

If we wish to ensure that blocks are sufficiently replicated, then we'll block on file close,
right?

Overall, this sounds like an approach worth trying.

> Remove the DFS Client disk-based cache
> --------------------------------------
>
>                 Key: HADOOP-1707
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1707
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>             Fix For: 0.16.0
>
>
> The DFS client currently uses a staging file on local disk to cache all user-writes to
a file. When the staging file accumulates 1 block worth of data, its contents are flushed
to a HDFS datanode. These operations occur sequentially.
> A simple optimization of allowing the user to write to another staging file while simultaneously
uploading the contents of the first staging file to HDFS will improve file-upload performance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message