hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "dhruba borthakur (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-1707) Remove the DFS Client disk-based cache
Date Tue, 30 Oct 2007 04:21:50 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12538657
] 

dhruba borthakur commented on HADOOP-1707:
------------------------------------------

I agree that that timeout issue does not have a very elegant solution. Here is a new proposal.

The Client
--------------
1. The Client uses a small pool of memory buffers per dfs-output stream. Say, 10 buffers of
size 64K each.
2. A write to the output stream actually copies the user data into one of the buffers, if
available. Otherwise the user-write blocks.
3. A separate thread (one per output stream), sends buffers that are full. Each buffer has
metadata that contains a sequence number (locally generated on the client) , the length of
the buffer and its offset in this block.
4. Another thread(one per output stream) process incoming responses. The incoming response
has the sequence number of the buffer that the datanode had processed. The client removes
that buffer from its queue.

The Primary Datanode
------------------------------
The primary datanode has two threads per stream. The first thread processes incoming packets
from the client, writes them to the downstream datanode and writes them to local disk. The
second thread processes responses from downstream datanodes and forwards them back to the
client.

This means that the client gets back an ack only when the packet is persisted on all datanodes.
In the future this can be changed so that the client gets an ack when the data is persisted
in dfs.replication.min number of datanodes.

In case the primary datanode encounters an exception while writing to the downstream datanode,
it declares the block as bad. It removes the immediate downstream datanode from the pipeline.
It makes an RPC to the namenode to abandon the current blockId and*replace* the block id with
a new one. It then establishes a new pipeline using the new blockid using the remaining datanodes.
 It then copies all the data from the local temporary block file to the downstream datanodes
using the new blockId.

The Secondary Datanodes
------------------------------------
The Secondary datanode has two threads per stream. The first thread processes incoming packets
from the upstream datanode, writes them to the downstream datanode and writes them to local
disk. The second thread processes responses from downstream datanodes and forwards them back
to the upstream datanode.

Each secondary datanode sends its response as well forwards the response of all downstream
datanodes.



> Remove the DFS Client disk-based cache
> --------------------------------------
>
>                 Key: HADOOP-1707
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1707
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>             Fix For: 0.16.0
>
>
> The DFS client currently uses a staging file on local disk to cache all user-writes to
a file. When the staging file accumulates 1 block worth of data, its contents are flushed
to a HDFS datanode. These operations occur sequentially.
> A simple optimization of allowing the user to write to another staging file while simultaneously
uploading the contents of the first staging file to HDFS will improve file-upload performance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message