hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: Append supported in hadoop 1.0.x branch?
Date Mon, 21 May 2012 11:39:04 GMT

I haven't tested append() enough times to know what triggers it but I
have often observed, both over the 0.20-append-based clusters I've
troubleshooted on and on the cdh-users list, that append() has lead to
odd tailing block sizes (beyond maximum allowed) and on/off warnings
of corrupt/failed blocks (relating to only the appended files though,
not random). In a few cases this leads to temporary unavailability of
data as client reports all blocks bad to the NN, which is a 'data
loss' case as any (for the moment anyway). I've not seen permanent or
spreading corruption, but this case was odd enough for me to not
recommend append() (not sync()) over that branch/releases that use it.
 YMMV. I'm unsure of the JIRA here or if this is the issue with the
new 2.x impl. as well, and I'll let other HDFS devs answer that.

The HBase case speaks for HBase's own use - which is just sync() at
this point. This is further why
https://issues.apache.org/jira/browse/HADOOP-8230 was done to separate
configs when toggling these two append() and sync() calls, so the docs
don't appear confusing as they do now.

On Mon, May 21, 2012 at 2:59 PM, Rodney O'Donnell <rodo@rodojojo.com> wrote:
> Thanks again for your response, one more clarification though.
> Are there any conditions under which I can trust append to work?
> For example, if I use ZK to lock the hdfs file to ensure there are no
> concurrent writes, then sync & close the file after each write?
> Also, I assume this has nothing to do with file formats (was a little
> confuses by one of the links below) and that append should not be trusted
> even when using a simple text file.
> Finally, any thoughts on the comment here
> http://hbase.apache.org/book/hadoop.html :
>    Ignore the chicken-little comment you'll find in the
> hdfs-default.xmlin the description for the
> dfs.support.append configuration; it says it is not enabled because there
>    are “... bugs in the 'append code' and is not supported in any
> production cluster.”. This comment is stale, from another era, and while
> I'm sure there are bugs, the
>    sync/append code has been running in production at large scale deploys
> and is on by default in the offerings of hadoop by commercial vendors
> [7<http://hbase.apache.org/book/hadoop.html#ftn.d1905e504>
> ] [8 <http://hbase.apache.org/book/hadoop.html#ftn.d1905e514>][9<http://hbase.apache.org/book/hadoop.html#ftn.d1905e520>
> ].
> I guess this comment is only 'chicken-little' for hbase use case (i.e.,
> sync is ok, append is not)?
> Cheers,
> Rod.
> On Fri, May 18, 2012 at 5:58 PM, Harsh J <harsh@cloudera.com> wrote:
>> Rodney,
>> There are two things that comprised the 0.20-append branch which added
>> "append" features, which to break down simply for 1.x:
>> append() - Available: Yes. Supported/Recommended: No.
>> sync() - Available: Yes. Supported/Recommended: Yes.
>> Please also see these links for further info/conversations on this
>> topic thats happened several times before:
>> https://issues.apache.org/jira/browse/HADOOP-8230
>> http://search-hadoop.com/m/638TD3bAXB1
>> http://search-hadoop.com/m/hBPRp1EWELS1
>> Let us know if you have further questions.
>> On Fri, May 18, 2012 at 12:12 PM, Rodney O'Donnell <rodo@rodojojo.com>
>> wrote:
>> > Hi,
>> >
>> > Is FileSystem.append supported on hadoop 1.0.x?  (1.0.3 in particular).
>> >
>> > Reading this list I thought it was back in for 1.0, but it's disabled by
>> > default so I'm not 100% sure.
>> > It would be great to get a definitive answer.
>> >
>> > Cheers,
>> >
>> > Rod.
>> --
>> Harsh J

Harsh J

View raw message