hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hairong Kuang (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-5744) Revisit append
Date Mon, 18 May 2009 22:38:45 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-5744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12710523#action_12710523

Hairong Kuang commented on HADOOP-5744:

Dhruba told me that hbase depended on a client calling of append to trigger the close of a
file that lost its writer. Once the file is closed, the client reads the file and starts to
work from the state defined in the closed file.

My question is that why the file needs to be closed before it is read. The read semantics
defined in this jira guarantees that
(1) any hflushed data become visible to any new readers;
(2) Once a byte becomes visible to a reader, it continues to be visible to the reader except
when all replicas containing the byte fail. This implies that a reader continues to see a
byte it saw before even when the replica that it read from fails, during any error recovery,
and after any error recovery as long as one replica containing the byte is available.

Is it OK for your client to trigger the close of the file but does not wait for it to close?
The idea is to read the file and resume working before the file gets closed. When the file
finally gets closed, the file
(1) may have more bytes than when it was previously read. This is a norm case. Will this be
an issue to hbase?
(2) If all replicas went down during the period the file is triggered to close and the time
it is closed, the file may end up with less bytes. This is a rare case. The default time for
this period is 10 minutes so the chance of losing the visible bytes is very slim. Can hbase
tolerate this?

> Revisit append
> --------------
>                 Key: HADOOP-5744
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5744
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.20.0
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>             Fix For: 0.21.0
>         Attachments: AppendSpec.pdf
> HADOOP-1700 and related issues have put a lot of efforts to provide the first implementation
of append. However, append is such a complex feature. It turns out that there are issues that
were initially seemed trivial but needs a careful design. This jira revisits append, aiming
for a design and implementation supporting a semantics that are acceptable to its users.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message