Mailing-List: contact hadoop-user-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hadoop-user@lucene.apache.org
Received-SPF: pass (asf.osuosl.org: domain of sutter@gmail.com designates
 64.233.162.193 as permitted sender)
DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws;
        s=beta; d=gmail.com;
        h=received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references;
        b=S1rbPgdlRd6J5xjnWzTT34florkl3gdzZBzo9lXmg/b7PnbX0BRb4vJmMsc7usWOeLF3WPS8SFmJds1BVULSWt5q2SGhnZtJB4RdcLY3OqZX+S71U8KFOA8AwjKovSKvpgivMEl6veADfTJP6Ah4diFWZe7az3SBk8cMT+haweI=
Message-ID: <e1d10fc00607141124r74118ed7of379fabcbb030c7d@mail.gmail.com>
Date: Fri, 14 Jul 2006 11:24:44 -0700
From: "Paul Sutter" <sutter@gmail.com>
To: hadoop-user@lucene.apache.org
Subject: Re: What about append in hadoop files ?
In-Reply-To: <44B7DDF4.90709@yahoo-inc.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
References: <20060714080611.98224.qmail@web34308.mail.mud.yahoo.com>
	 <44B763D6.6000503@apache.org> <44B7DDF4.90709@yahoo-inc.com>

When I first started using Hadoop, I was shocked and disturbed that
the append functionality didnt exist.

But as it turns out, we've had no problem at all working around it. I
have grown to really like the simple atomicness of the current
featureset.

On 7/14/06, Konstantin Shvachko <shv@yahoo-inc.com> wrote:
> Eric,
>
> I remember Doug advised somebody on a related issue to use a directory
> instead of a file for long lasting appends.
> You can logically divide your output into smaller files and close them
> whenever the logical boundary is reached.
> The directory can be treated as a collection of records. May be this
> will work for you.
> IMO the concurrent append feature is a high priority task.
>
> --Konstantin
>
> Doug Cutting wrote:
>
> > drwho wrote:
> >
> >> If so, GFS, is also suitable only for large, offline, batch
> >> computations ?
> >> I wonder how Google is going to use GFS for writely or their online
> >> spreadsheet or their  BigTable (their gigantic relational DB).
> >
> >
> > Did I say anything about GFS?  I don't think so.  Also, I said,
> > "currently" and "primarily", not "forever" and "exclusively".  I would
> > love for DFS to be more suitable for online, incremental stuff, but
> > we're a ways from that right now.  As I said, we're pursuing
> > reliability, scalability and performance before features like append.
> > If you'd like to try to implement append w/o disrupting work on
> > reliability scalability and performance, we'd welcome your
> > contributions.  The project direction is determined by contributors.
> >
> > Note that BigTable is a complex layer on top of GFS that caches and
> > batches i/o.  So, while GFS does implement some features that DFS
> > still does not (like appends), GFS is probably not used directly by,
> > e.g., writely.  Finally, BigTable is not relational.
> >
> > Doug
> >
> >> Doug Cutting <cutting@apache.org> wrote: <chopped>
> >>
> >> DFS is currently primarily used to support large, offline, batch
> >> computations.  For example, a log of critical data with tight
> >> transactional requirements is probably an inappropriate use of DFS at
> >> this time.  Again, this may change, but that's where we are now.
> >>
> >> Doug
> >>
> >>
> >>
> >>
> >> Thanks much.
> >>
> >> -eric
> >>
> >
> >
> >
>
>