hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "sam rash (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HDFS-895) Allow hflush/sync to occur in parallel with new writes to the file
Date Tue, 22 Jun 2010 00:59:59 GMT

    [ https://issues.apache.org/jira/browse/HDFS-895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12880991#action_12880991
] 

sam rash commented on HDFS-895:
-------------------------------

re: the patch

I realize in the hadoop code we already swallow InterruptedException frequently, but I think
you can change the trend here:

{code}
// wait for all acks to be received back from datanodes
        synchronized (ackQueue) {
          if (!closed && ackQueue.size() != 0) {
            try {
              ackQueue.wait();
            } catch (InterruptedException e) {
              Thread.currentThread.interrupt();  //add this 
            }
            continue;
          }
        }
{code}

otherwise, it's very easy to have a thread that I own and manage that has a DFSOutputStream
in it that swallows an interrupt. when i check Thread.currentThread.isInterrupted() to see
if one of my other threads has interrupted me, i will not see it

(the crux here is that swallowing interrupts in threads that hadoop controls are less harmful--this
is directly in client code when you call sync()/close())


> Allow hflush/sync to occur in parallel with new writes to the file
> ------------------------------------------------------------------
>
>                 Key: HDFS-895
>                 URL: https://issues.apache.org/jira/browse/HDFS-895
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: hdfs client
>    Affects Versions: 0.22.0
>            Reporter: dhruba borthakur
>            Assignee: Todd Lipcon
>             Fix For: 0.22.0
>
>         Attachments: hdfs-895-0.20-append.txt, hdfs-895-20.txt, hdfs-895-trunk.txt, hdfs-895.txt
>
>
> In the current trunk, the HDFS client methods writeChunk() and hflush./sync are syncronized.
This means that if a hflush/sync is in progress, an applicationn cannot write data to the
HDFS client buffer. This reduces the write throughput of the transaction log in HBase. 
> The hflush/sync should allow new writes to happen to the HDFS client even when a hflush/sync
is in progress. It can record the seqno of the message for which it should receice the ack,
indicate to the DataStream thread to star flushing those messages, exit the synchronized section
 and just wai for that ack to arrive.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message