hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jim Kellerman (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-5744) Revisit append
Date Mon, 18 May 2009 23:00:45 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-5744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12710527#action_12710527

Jim Kellerman commented on HADOOP-5744:

The problem we need to solve is:
- process writing a file crashes after doing some sync() operations
- another process knows that the writer has crashed and needs to
-- recover the lease (immediately)
-- be able to read all the data up to the last sync()

Of the APIs described in the following email, from Sanjay below, APIs 1-2 are are inadequate,
API3 is ok provided HDFS does not fail. API4 works if the datanode(s) fail but not if the
machine crashes. Only API5 will guarantee that we can read the data that has been sync'd.

> From: Sanjay Radia [mailto:sradia@yahoo-inc.com] 
> Sent: Thursday, May 14, 2009 1:46 PM
> To: Jim Kellerman (POWERSET)
> Cc: Michael Stack; Chad Walters; Dhruba Borthakur; Sameer Paranjpye; Hairong Kuang; Robert
> Subject: Re: Append, flush, sync write and HBase
>> On May 13, 2009, at 1:20 PM, Jim Kellerman (POWERSET) wrote:
>> What we need are two things:
>> 1. When we call sync() we want to be assured that any buffered data can be read by
another process
> Actually I wanted to have a larger discussion to understand your
> current and future requirements on append/flush/sync also on latency
> of HDFS. I am trying to document current and future
> requirements. Which is why I wanted to do a quick chat on the phone. I
> will try this via email for now.
> BTW Hairong @ Y! is driving the re-implementation of append. See
> HADOOP-5744. She has defined the semantics she is considering.  Please
> comment whether you agree or disagree.
> We are also looking at variation on semantics that may have lower
> latencies and lesser guarantees.  We would like to get your initial
> feedback.  Eventually we will update the Jira when we have semantics
> and apis better formulated.
> Below is a list of APIs/semantics variations we are considering.
> Which ones do you absolutely needed for HBase in the short term and
> which ones may be useful to HBase in the longer term.
> API1: flushes out from the address space of client into the socket to the data nodes.

>     On the return of the call there is no guarantee that that data is
>     out of the underlying node and no guarantee of having reached a
>     DN.  Readers will see this data soon if there are no failures.
>     For example, I suspect Scribe and chukwa will like the lower
>     latency of this API and are prepared to loose some records
>     occasionally in case of failures.  Clearly a journal will not find
>     this api acceptable.
> API2: flushes out to at lease one data node and receives an ack.
>     New readers will eventually see the data
> API3: flushes out to all replicas of the block. The data is in the buffers of the DNs
but not on the DN's OS buffers
>    New readers will see the data after the call has returned. (Hadoop
>    5744 calls API3 hflush for now).
> API4: flushes out to all replicas and all replicas DNs  have done a posix fflush equivalent
- ie data  is out the under lying OS file system of the DNs
> API5: flushes out to all replicas and all repliacs have done posix fsync equivalent -
ie the OS has flushed it to the disk device (but the disk may have it in its cache).
> Does the HBase edits journal require API 3, 4 or 5?
> What are your latency requirements for the write operation. For
> example can you tolerate occasional larger latency for the
> fflush/fsycn operation?

> Revisit append
> --------------
>                 Key: HADOOP-5744
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5744
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.20.0
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>             Fix For: 0.21.0
>         Attachments: AppendSpec.pdf
> HADOOP-1700 and related issues have put a lot of efforts to provide the first implementation
of append. However, append is such a complex feature. It turns out that there are issues that
were initially seemed trivial but needs a careful design. This jira revisits append, aiming
for a design and implementation supporting a semantics that are acceptable to its users.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message