hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-744) Support hsync in HDFS
Date Fri, 18 May 2012 22:15:10 GMT

    [ https://issues.apache.org/jira/browse/HDFS-744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13279255#comment-13279255
] 

Todd Lipcon commented on HDFS-744:
----------------------------------

Quickly looked at the patch. A few notes:

- I don't think it's a good idea to change SequenceFile.syncFs() to actually call through
to hsync(), given how much more expensive it is. This would be a "performance-incompatible
change". As much as it sucks, maybe we need to deprecate syncFs() and add a new method, instead
-- eg one that takes a boolean or an enum explaining what level of sync is needed.
- For the javadoc on SYNC_BLOCK, should explain that, though it's similar to O_SYNC, it differs
in that it's only on close that it is synced, and recommend that the user call hsync() explicitly
after each write if true synchronous behavior is required
- TestHSync is missing license header
- Metrics: we should have some metrics on number of hsyncs performed at each DN, as well as
the time spent in FileChannel.force. The functional tests could then verify these metrics
are actually incremented after the hsync calls where expected
- It seems wrong that the syncBlock flag is on the packets themselves. Why do we need this
flag? Why not just have the client (or server) keep a flag which gets set whenever hsync()
has been called. Then, when the client sends "last packet in block", the sync flag also gets
set?
- Can you explain this logic to me?
{code}
+      if (syncPacket && !(syncBlock && lastPacketInBlock)) {
+        flushOrSync(true);
+      }
{code}
Is that to avoid the cost of a "double" sync on close? If so, can you add a comment as much?

- 
{code}
+  required bool syncBlock = 5;
+  required bool syncPacket = 6;
{code}
These flags should be optional with a default of false, so that we don't break client-server
compatibility with 2.0.0-alpha.
                
> Support hsync in HDFS
> ---------------------
>
>                 Key: HDFS-744
>                 URL: https://issues.apache.org/jira/browse/HDFS-744
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: data-node, hdfs client
>            Reporter: Hairong Kuang
>            Assignee: Lars Hofhansl
>         Attachments: HDFS-744-trunk-v2.patch, HDFS-744-trunk-v3.patch, HDFS-744-trunk-v4.patch,
HDFS-744-trunk-v5.patch, HDFS-744-trunk.patch, hdfs-744-v2.txt, hdfs-744-v3.txt, hdfs-744.txt
>
>
> HDFS-731 implements hsync by default as hflush. As descriibed in HADOOP-6313, the real
expected semantics should be "flushes out to all replicas and all replicas have done posix
fsync equivalent - ie the OS has flushed it to the disk device (but the disk may have it in
its cache)." This jira aims to implement the expected behaviour.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message