hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tsz Wo (Nicholas), SZE (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-3979) Fix hsync and hflush semantics.
Date Fri, 05 Oct 2012 01:31:48 GMT

    [ https://issues.apache.org/jira/browse/HDFS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13469952#comment-13469952

Tsz Wo (Nicholas), SZE commented on HDFS-3979:

For applications like HBase we'd like API4 as well as API5.
(API4 allows a hypothetical kill -9 of all DNs without loss of acknowledged data, API5 allows
HW failures of all data nodes - i.e. a DC outage - with loss of acknowledged data)
Why API4 is needed for HBase?

As everyone known, there are usually 3 replicas in HDFS.  If only one of the datanodes is
killed, the data is still available in the other two datanodes.  That's why we have invented
"hflush" (i.e. API 3) in HDFS-265.
> Fix hsync and hflush semantics.
> -------------------------------
>                 Key: HDFS-3979
>                 URL: https://issues.apache.org/jira/browse/HDFS-3979
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: data-node, hdfs client
>    Affects Versions: 0.22.0, 0.23.0, 2.0.0-alpha
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>         Attachments: hdfs-3979-sketch.txt, hdfs-3979-v2.txt
> See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver is not on
a synchronous path from the DFSClient, hence it is possible that a DN loses data that it has
already acknowledged as persisted to a client.
> Edit: Spelling.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message