Return-Path: Delivered-To: apmail-lucene-hadoop-dev-archive@locus.apache.org Received: (qmail 79888 invoked from network); 5 Sep 2007 05:47:04 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 5 Sep 2007 05:47:04 -0000 Received: (qmail 62423 invoked by uid 500); 5 Sep 2007 05:46:57 -0000 Delivered-To: apmail-lucene-hadoop-dev-archive@lucene.apache.org Received: (qmail 62389 invoked by uid 500); 5 Sep 2007 05:46:57 -0000 Mailing-List: contact hadoop-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hadoop-dev@lucene.apache.org Delivered-To: mailing list hadoop-dev@lucene.apache.org Received: (qmail 62380 invoked by uid 99); 5 Sep 2007 05:46:57 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 04 Sep 2007 22:46:57 -0700 X-ASF-Spam-Status: No, hits=4.0 required=10.0 tests=RCVD_NUMERIC_HELO,SPF_NEUTRAL,WHOIS_MYPRIVREG X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [65.109.252.168] (HELO mail4.omnex.netvigour.com) (65.109.252.168) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 05 Sep 2007 05:46:52 +0000 Received: from mail6.netvigour.com ([10.201.10.1]) by mail4.omnex.netvigour.com with Microsoft SMTPSVC(6.0.3790.1830); Wed, 5 Sep 2007 01:43:19 -0400 Received: from 24.23.168.96 ([24.23.168.96]) by mail6.netvigour.com ([10.201.10.1]) via Exchange Front-End Server mail.netvigour.com ([10.201.10.9]) with Microsoft Exchange Server HTTP-DAV ; Wed, 5 Sep 2007 05:43:19 +0000 Received: from vermin.localdomain by mail.netvigour.com; 04 Sep 2007 22:46:25 -0700 Subject: Re: [jira] Commented: (HADOOP-1700) Append to files in HDFS From: Jim Kellerman To: hadoop-dev@lucene.apache.org In-Reply-To: <27564002.1188970712868.JavaMail.root@brutus> References: <27564002.1188970712868.JavaMail.root@brutus> Content-Type: text/plain Content-Transfer-Encoding: 7bit Date: Tue, 04 Sep 2007 22:46:25 -0700 Message-Id: <1188971185.3484.2.camel@vermin.localdomain> Mime-Version: 1.0 X-Mailer: Evolution 2.8.3 (2.8.3-2.fc6) X-OriginalArrivalTime: 05 Sep 2007 05:43:19.0906 (UTC) FILETIME=[AA387420:01C7EF7F] X-Virus-Checked: Checked by ClamAV on apache.org +1! On Tue, 2007-09-04 at 22:38 -0700, eric baldeschwieler (JIRA) wrote: > [ https://issues.apache.org/jira/browse/HADOOP-1700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12524988 ] > > eric baldeschwieler commented on HADOOP-1700: > --------------------------------------------- > > Just wanted to pitch in some context... > > Jim stated in the opening of this bug that a single client writing would be enough to address this issue. I agree. But what we should be clearer about is the ultimate desired semantics for readers. I'd define success as having a single client doing appends and flushes as desired (say per line in a log file) and having multiple clients "tail -f" the file and see updates at a reasonable rate, IE soon after each flush or every 64k bytes or so with less than a seconds latency. > > This would let us build systems that log directly into HDFS and have related systems respond based on those log streams. > > This is where I'd like to see us get with this issue. Clearly getting there involves getting a handle on all the stuff already discussed in this thread. We also need to think carefully about the pipelining and protocol issues involved in making this work. > > We might want to break the protocol change issues into another discussion, but I want to make sure we don't converge on solutions that will not work considering fine grained "flushes". > > > Append to files in HDFS > > ----------------------- > > > > Key: HADOOP-1700 > > URL: https://issues.apache.org/jira/browse/HADOOP-1700 > > Project: Hadoop > > Issue Type: New Feature > > Components: dfs > > Reporter: stack > > > > Request for being able to append to files in HDFS has been raised a couple of times on the list of late. For one example, see http://www.nabble.com/HDFS%2C-appending-writes-status-tf3848237.html#a10916193. Other mail describes folks' workarounds because this feature is lacking: e.g. http://www.nabble.com/Loading-data-into-HDFS-tf4200003.html#a12039480 (Later on this thread, Jim Kellerman re-raises the HBase need of this feature). HADOOP-337 'DFS files should be appendable' makes mention of file append but it was opened early in the life of HDFS when the focus was more on implementing the basics rather than adding new features. Interest fizzled. Because HADOOP-337 is also a bit of a grab-bag -- it includes truncation and being able to concurrently read/write -- rather than try and breathe new life into HADOOP-337, instead, here is a new issue focused on file append. Ultimately, being able to do as the google GFS paper describes -- having multiple concurrent clients making 'Atomic Record Append' to a single file would be sweet but at least for a first cut at this feature, IMO, a single client appending to a single HDFS file letting the application manage the access would be sufficent. > -- Jim Kellerman, Senior Engineer; Powerset jim@powerset.com