hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HDFS-895) Allow hflush/sync to occur in parallel with new writes to the file
Date Fri, 12 Nov 2010 00:26:16 GMT

    [ https://issues.apache.org/jira/browse/HDFS-895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12931280#action_12931280
] 

Todd Lipcon commented on HDFS-895:
----------------------------------

Hey Hairong. I had actually recalled incorrectly which part of that confusing code is new
- only the "currentSeqno--" code is new, to prevent skipping a sequence number. Here's a diff
that ignores whitespace change:

{code}
       // Flush only if we haven't already flushed till this offset.
       if (lastFlushOffset != bytesCurBlock) {
-
+          assert bytesCurBlock > lastFlushOffset;
         // record the valid offset of this flush
         lastFlushOffset = bytesCurBlock;
-
-        // wait for all packets to be sent and acknowledged
-        flushInternal();
+          queueCurrentPacket();
       } else {
         // just discard the current packet since it is already been sent.
+          if (oldCurrentPacket == null && currentPacket != null) {
+            // If we didn't previously have a packet queued, and now we do,
+            // but we don't plan on sending it, then we should not
+            // skip a sequence number for it!
+            currentSeqno--;
+          }
         currentPacket = null;
       }
{code}

As you can see we already had the code that avoided duplicate packets.

> Allow hflush/sync to occur in parallel with new writes to the file
> ------------------------------------------------------------------
>
>                 Key: HDFS-895
>                 URL: https://issues.apache.org/jira/browse/HDFS-895
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: hdfs client
>    Affects Versions: 0.22.0
>            Reporter: dhruba borthakur
>            Assignee: Todd Lipcon
>             Fix For: 0.22.0
>
>         Attachments: 895-delta-for-review.txt, hdfs-895-0.20-append.txt, hdfs-895-20.txt,
hdfs-895-review.txt, hdfs-895-trunk.txt, hdfs-895.txt, hdfs-895.txt, hdfs-895.txt
>
>
> In the current trunk, the HDFS client methods writeChunk() and hflush./sync are syncronized.
This means that if a hflush/sync is in progress, an applicationn cannot write data to the
HDFS client buffer. This reduces the write throughput of the transaction log in HBase. 
> The hflush/sync should allow new writes to happen to the HDFS client even when a hflush/sync
is in progress. It can record the seqno of the message for which it should receice the ack,
indicate to the DataStream thread to star flushing those messages, exit the synchronized section
 and just wai for that ack to arrive.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message