hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "dhruba borthakur (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HADOOP-1707) Remove the DFS Client disk-based cache
Date Tue, 13 Nov 2007 09:08:52 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

dhruba borthakur updated HADOOP-1707:
-------------------------------------

    Attachment: clientDiskBuffer6.patch

This patch removes the client side disk buffer. 

1. FSConstants.java : Bumped up DATA_TRANSFER_VERSION.
2. Daemon.java: Added a ThreadGroup to the Daemon class. All worker threads that process data
transfers belong to this group. The shutdown of a datnode waits for the entire threadgroup
to exit. Prior to this change, a datanode shutdown did not wait for the data transfer threads
to exit.
3. FSNamesystem.java: Allows a zero size file to have no blocks associated with it.
4. DataChecksum.java: A utility method to return the size of a checksum header.
5. FSDataset.java: The ongoingCreates data structure remembers the thread that is currently
writing to a block. The writeToBlock() method (when the recovery flag is set) terminates any
existing threads that might have been writing to a block before allowing a new thread to write
to the same block.
6. FSDataOutputStream.java: The unit test needed to extract the pipeline associated with a
block. This is facilitated by exposing a new public API called getWrappedStream() that returns
the underlying DFSOutputStream object.
7. MiniDFSCluster.java: Allows stopping a particular datanode.
8. DFSClient.java/DataNode.java: User data is transferred in the form of packets. Each Packet
requires an ack from all datanodes. The DFSClient drives the entire recovery strategy. A keepalive
is sent every READ_TIMEOUT/2 period on the response socket channel. Each packet is 64K in
size and the client has a sliding window of 5MB per stream.
9. TestDatanodeDeath.java: A unit test to trigger datanode deaths and DFSClient recovery.




> Remove the DFS Client disk-based cache
> --------------------------------------
>
>                 Key: HADOOP-1707
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1707
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>             Fix For: 0.16.0
>
>         Attachments: clientDiskBuffer.patch, clientDiskBuffer2.patch, clientDiskBuffer6.patch
>
>
> The DFS client currently uses a staging file on local disk to cache all user-writes to
a file. When the staging file accumulates 1 block worth of data, its contents are flushed
to a HDFS datanode. These operations occur sequentially.
> A simple optimization of allowing the user to write to another staging file while simultaneously
uploading the contents of the first staging file to HDFS will improve file-upload performance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message