Return-Path: Delivered-To: apmail-lucene-hadoop-user-archive@locus.apache.org Received: (qmail 47357 invoked from network); 14 Jul 2006 18:25:11 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 14 Jul 2006 18:25:11 -0000 Received: (qmail 8634 invoked by uid 500); 14 Jul 2006 18:25:09 -0000 Delivered-To: apmail-lucene-hadoop-user-archive@lucene.apache.org Received: (qmail 8577 invoked by uid 500); 14 Jul 2006 18:25:09 -0000 Mailing-List: contact hadoop-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hadoop-user@lucene.apache.org Delivered-To: mailing list hadoop-user@lucene.apache.org Received: (qmail 8568 invoked by uid 99); 14 Jul 2006 18:25:08 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 14 Jul 2006 11:25:08 -0700 X-ASF-Spam-Status: No, hits=0.5 required=10.0 tests=DNS_FROM_RFC_ABUSE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (asf.osuosl.org: domain of sutter@gmail.com designates 64.233.162.193 as permitted sender) Received: from [64.233.162.193] (HELO nz-out-0102.google.com) (64.233.162.193) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 14 Jul 2006 11:25:05 -0700 Received: by nz-out-0102.google.com with SMTP id 14so240710nzn for ; Fri, 14 Jul 2006 11:24:44 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=S1rbPgdlRd6J5xjnWzTT34florkl3gdzZBzo9lXmg/b7PnbX0BRb4vJmMsc7usWOeLF3WPS8SFmJds1BVULSWt5q2SGhnZtJB4RdcLY3OqZX+S71U8KFOA8AwjKovSKvpgivMEl6veADfTJP6Ah4diFWZe7az3SBk8cMT+haweI= Received: by 10.36.47.6 with SMTP id u6mr3424800nzu; Fri, 14 Jul 2006 11:24:44 -0700 (PDT) Received: by 10.36.132.16 with HTTP; Fri, 14 Jul 2006 11:24:44 -0700 (PDT) Message-ID: Date: Fri, 14 Jul 2006 11:24:44 -0700 From: "Paul Sutter" To: hadoop-user@lucene.apache.org Subject: Re: What about append in hadoop files ? In-Reply-To: <44B7DDF4.90709@yahoo-inc.com> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <20060714080611.98224.qmail@web34308.mail.mud.yahoo.com> <44B763D6.6000503@apache.org> <44B7DDF4.90709@yahoo-inc.com> X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N When I first started using Hadoop, I was shocked and disturbed that the append functionality didnt exist. But as it turns out, we've had no problem at all working around it. I have grown to really like the simple atomicness of the current featureset. On 7/14/06, Konstantin Shvachko wrote: > Eric, > > I remember Doug advised somebody on a related issue to use a directory > instead of a file for long lasting appends. > You can logically divide your output into smaller files and close them > whenever the logical boundary is reached. > The directory can be treated as a collection of records. May be this > will work for you. > IMO the concurrent append feature is a high priority task. > > --Konstantin > > Doug Cutting wrote: > > > drwho wrote: > > > >> If so, GFS, is also suitable only for large, offline, batch > >> computations ? > >> I wonder how Google is going to use GFS for writely or their online > >> spreadsheet or their BigTable (their gigantic relational DB). > > > > > > Did I say anything about GFS? I don't think so. Also, I said, > > "currently" and "primarily", not "forever" and "exclusively". I would > > love for DFS to be more suitable for online, incremental stuff, but > > we're a ways from that right now. As I said, we're pursuing > > reliability, scalability and performance before features like append. > > If you'd like to try to implement append w/o disrupting work on > > reliability scalability and performance, we'd welcome your > > contributions. The project direction is determined by contributors. > > > > Note that BigTable is a complex layer on top of GFS that caches and > > batches i/o. So, while GFS does implement some features that DFS > > still does not (like appends), GFS is probably not used directly by, > > e.g., writely. Finally, BigTable is not relational. > > > > Doug > > > >> Doug Cutting wrote: > >> > >> DFS is currently primarily used to support large, offline, batch > >> computations. For example, a log of critical data with tight > >> transactional requirements is probably an inappropriate use of DFS at > >> this time. Again, this may change, but that's where we are now. > >> > >> Doug > >> > >> > >> > >> > >> Thanks much. > >> > >> -eric > >> > > > > > > > >