hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: append feature in 1.0.X - current stable version
Date Tue, 15 May 2012 04:33:58 GMT

On Tue, May 15, 2012 at 3:55 AM,  <ext-fabio.almeida@nokia.com> wrote:
> Hello All,
> Does someone knows if all issues from append functionality have been fixed on Hadoop
latest stable version (1.0.X)???

The 0.20-append branch introduced two client-end calls:

1. append() - This is still known to be broken. This allows you to
reopen files and add data in it.
2. sync()  - This works reliably well. This allows you to immediately
flush your writer data to DataNodes and allow new readers read it
properly (without having to close the file).

The first is still broken in 1.0 and has some odd bugs that surface
depending on some edge cases. It is highly recommended not to use it.
The second is what HBase/Flume/etc. use,
and works pretty nicely.

> I mean, I had a lot of problems  with append on hadoop 0.20.2. I noticed that one of
the guarantees of the append function, that readers can read data that has been flushed by
the writer, was not working.

Apache Hadoop 0.20.2 had no append or sync features. I am not sure
what you're calling broken. The sync feature from 0.20-append branch
(Which is present in 0.20.205/1.x, CDH3, etc.) works just properly and
hundreds of HBase users out there leverage it indirectly.

> For that reason they created the config parameter dfs.support.append, which is false
by default, just to be used on development or test clusters. Is it true by default now on
latest version?

It isn't true now either. However, a recent change has spilt apart the
two calls and now sync() is enabled by default, and append() is
disabled by default unless you set dfs.support.broken.append to true

> IF append functionality is not stable yet, does someone knows if there is some estimative
to be?

0.23/2.0.0 should have a better implementation of that, but I haven't
tested it out personally. For most of my use cases, sync() (Which in
2.0/0.23 is known as hflush() and hsync()) suffices.

Harsh J

View raw message