Return-Path: Delivered-To: apmail-hadoop-hbase-dev-archive@minotaur.apache.org Received: (qmail 29229 invoked from network); 23 May 2009 23:02:09 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 23 May 2009 23:02:09 -0000 Received: (qmail 74270 invoked by uid 500); 23 May 2009 23:02:21 -0000 Delivered-To: apmail-hadoop-hbase-dev-archive@hadoop.apache.org Received: (qmail 74207 invoked by uid 500); 23 May 2009 23:02:21 -0000 Mailing-List: contact hbase-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hbase-dev@hadoop.apache.org Delivered-To: mailing list hbase-dev@hadoop.apache.org Received: (qmail 74197 invoked by uid 99); 23 May 2009 23:02:21 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 23 May 2009 23:02:21 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of saint.ack@gmail.com designates 74.125.92.24 as permitted sender) Received: from [74.125.92.24] (HELO qw-out-2122.google.com) (74.125.92.24) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 23 May 2009 23:02:12 +0000 Received: by qw-out-2122.google.com with SMTP id 3so1479299qwe.35 for ; Sat, 23 May 2009 16:01:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:sender:received:date :x-google-sender-auth:message-id:subject:from:to:content-type :content-transfer-encoding; bh=33gu/I8ncEmf24dr/3TMKaX7IIbWHHXaCCDsyqK2o+s=; b=rTGqklPJZ8/SvS0pKjQJ7aL2uvaBrdaXXONx9gSmGSh0lPmI1aHtCYV7gB4aES2Tg/ yNAOrZh1lUOvdcNlfJo+xQw5Tl5uEDAbOG2WW8iUK0TjXMHUXyzuT1fqMTNyljNHly87 wuYXS4rSvKiYyM8PVvExNn2orPTl1UceI02eU= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:date:x-google-sender-auth:message-id:subject :from:to:content-type:content-transfer-encoding; b=XuaAMBfkl1JVLC/mAIh02homtRqqygud9/t4+z9OcunLdhLBAkANhdzxGNyqclSfsC TlKpTc0EP+apOs/EdIoz/DfnjgHf+Txh/bhDiHk5eAdCoF5Em3AIfQ45A1xV7RDRLt0E 8s4s1p2gycuZSv1kQUg+bM00vdZ03pCoL5i7U= MIME-Version: 1.0 Sender: saint.ack@gmail.com Received: by 10.229.110.5 with SMTP id l5mr1755562qcp.88.1243119711353; Sat, 23 May 2009 16:01:51 -0700 (PDT) Date: Sat, 23 May 2009 16:01:51 -0700 X-Google-Sender-Auth: ca8eeec869cb0e97 Message-ID: <7c962aed0905231601g533088ebj4a7a068505ba3f50@mail.gmail.com> Subject: Hadoop Committers Meeting at Yahoo on append/flush/sync From: stack To: hbase-dev@hadoop.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org A few of us went to a Hadoop Committers Meeting kindly hosted by Yahoo! yesterday. HBase was represented by Chad Walters, Jim Kellerman, Ryan Rawson, and myself. The rest of the meeting attendees were a bunch of the Y! HDFS team, plus meeting leader MapReducer Owen O'Malley, along with Facebookees (Dhruba, Ashish, etc.) and Luke Liu of HyperTable/Zvents. The meeting topic was append/flush/sync in HDFS. After some back and forth over a set of slides presented by Sanjay on work being done by Hairong as part of HADOOP-5744, "Revising append", the room settled on API3 from the list of options below as the priority feature needed by HADOOP 0.21.0. Readers must be able to read up to the last writer 'successful' flush. Its not important that the file length is 'inexact'. Hairong's revisit work builds on the work done in HADOOP-4379, etc., but is a different effort. It was presented that the latest HADOOP-4379 patch works pretty good and that its a million times better than nothing though there is some lag while lease is recovered (Hairong and Dhruba chatting think that the cycle waiting on a successful append so we can then close, and then open to read may not actually be necessary -- will update HADOOP-4379 after trying it out). Dhruba notes HADOOP-4379 is not enough. HADOOP-4663 is also needed. We need to test but in discussion, a patched HADOOP 0.20.0 with a working flush may be possible. Before the above meeting, a few of us met with the Y! HDFS team to chat. On DFSClient recovery, while in the room, Raghu may have fingered our problem: HADOOP-5903. On xceiver count, because TRUNK uses pread in HDFS, the number of occupied threads in datanodes may actually be much lower since pread opens socket, reads and then closes the socket. We need to test. On occasional slow writes into HDFS, we need to check see what the datanode is doing at the time. St.Ack Below are options presented by Sanjay: > Below is a list of APIs/semantics variations we are considering. > Which ones do you absolutely needed for HBase in the short term and > which ones may be useful to HBase in the longer term. > > API1: flushes out from the address space of client into the socket to the data nodes. > > On the return of the call there is no guarantee that that data is > out of the underlying node and no guarantee of having reached a > DN. Readers will see this data soon if there are no failures. > > For example, I suspect Scribe and chukwa will like the lower > latency of this API and are prepared to loose some records > occasionally in case of failures. Clearly a journal will not find > this api acceptable. > > API2: flushes out to at lease one data node and receives an ack. > > New readers will eventually see the data > > API3: flushes out to all replicas of the block. The data is in the buffers of the DNs but not on the DN's OS buffers > > New readers will see the data after the call has returned. (Hadoop > 5744 calls API3 hflush for now). > > API4: flushes out to all replicas and all replicas DNs have done a posix fflush equivalent - ie data is out the under lying OS file system of the DNs > > API5: flushes out to all replicas and all repliacs have done posix fsync equivalent - ie the OS has flushed it to the disk device (but the disk may have it in its cache).