hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paul Sutter" <sut...@gmail.com>
Subject Re: What about append in hadoop files ?
Date Fri, 14 Jul 2006 18:24:44 GMT
When I first started using Hadoop, I was shocked and disturbed that
the append functionality didnt exist.

But as it turns out, we've had no problem at all working around it. I
have grown to really like the simple atomicness of the current
featureset.

On 7/14/06, Konstantin Shvachko <shv@yahoo-inc.com> wrote:
> Eric,
>
> I remember Doug advised somebody on a related issue to use a directory
> instead of a file for long lasting appends.
> You can logically divide your output into smaller files and close them
> whenever the logical boundary is reached.
> The directory can be treated as a collection of records. May be this
> will work for you.
> IMO the concurrent append feature is a high priority task.
>
> --Konstantin
>
> Doug Cutting wrote:
>
> > drwho wrote:
> >
> >> If so, GFS, is also suitable only for large, offline, batch
> >> computations ?
> >> I wonder how Google is going to use GFS for writely or their online
> >> spreadsheet or their  BigTable (their gigantic relational DB).
> >
> >
> > Did I say anything about GFS?  I don't think so.  Also, I said,
> > "currently" and "primarily", not "forever" and "exclusively".  I would
> > love for DFS to be more suitable for online, incremental stuff, but
> > we're a ways from that right now.  As I said, we're pursuing
> > reliability, scalability and performance before features like append.
> > If you'd like to try to implement append w/o disrupting work on
> > reliability scalability and performance, we'd welcome your
> > contributions.  The project direction is determined by contributors.
> >
> > Note that BigTable is a complex layer on top of GFS that caches and
> > batches i/o.  So, while GFS does implement some features that DFS
> > still does not (like appends), GFS is probably not used directly by,
> > e.g., writely.  Finally, BigTable is not relational.
> >
> > Doug
> >
> >> Doug Cutting <cutting@apache.org> wrote: <chopped>
> >>
> >> DFS is currently primarily used to support large, offline, batch
> >> computations.  For example, a log of critical data with tight
> >> transactional requirements is probably an inappropriate use of DFS at
> >> this time.  Again, this may change, but that's where we are now.
> >>
> >> Doug
> >>
> >>
> >>
> >>
> >> Thanks much.
> >>
> >> -eric
> >>
> >
> >
> >
>
>

Mime
View raw message