hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HDFS-895) Allow hflush/sync to occur in parallel with new writes to the file
Date Thu, 15 Apr 2010 08:58:52 GMT

     [ https://issues.apache.org/jira/browse/HDFS-895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Todd Lipcon updated HDFS-895:
-----------------------------

    Attachment: hdfs-895.txt

Here's a preliminary patch against 0.20 sync (will forward port it, but HBase on 20 makes
a good testing ground). It could do with a thorough code review, as this is tricky code, but
the general idea is simple enough. Also I want to augment the unit test to do some data verification.

The included test case can also be run as a benchmark, where it runs 10 threads, each of which
just appends 511-byte chunks and calls sync for each one. With the patched DFSClient, it runs
in about 33 seconds on my test cluster. Without the patched DFSClient it took 290 seconds
(and jstack shows most threads blocked most of the time). This is confirming that we expected
- there's a lot of parallelism to be gained for multithreaded writers.

> Allow hflush/sync to occur in parallel with new writes to the file
> ------------------------------------------------------------------
>
>                 Key: HDFS-895
>                 URL: https://issues.apache.org/jira/browse/HDFS-895
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: hdfs client
>            Reporter: dhruba borthakur
>            Assignee: Todd Lipcon
>         Attachments: hdfs-895.txt
>
>
> In the current trunk, the HDFS client methods writeChunk() and hflush./sync are syncronized.
This means that if a hflush/sync is in progress, an applicationn cannot write data to the
HDFS client buffer. This reduces the write throughput of the transaction log in HBase. 
> The hflush/sync should allow new writes to happen to the HDFS client even when a hflush/sync
is in progress. It can record the seqno of the message for which it should receice the ack,
indicate to the DataStream thread to star flushing those messages, exit the synchronized section
 and just wai for that ack to arrive.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message