hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arun C Murthy <...@hortonworks.com>
Subject Re: [DISCUSS] Apache Hadoop 1.0?
Date Thu, 17 Nov 2011 19:09:36 GMT

On Nov 17, 2011, at 8:33 AM, Roman Shaposhnik wrote:

> On Thu, Nov 17, 2011 at 2:45 AM, Steve Loughran <stevel@apache.org> wrote:
>> -0.23 is a superset of the MR and HDFS APIs compatible with previous
>> versions (I don't know or care whether or not it is a proper superset or
>> not). The goal here is that end user apps and higher levels in the stack
>> (in-ASF and out-ASF) should work, though testing is required to verify this.
> I believe that by now we have enough factual evidence that at least
> framework-level
> APIs are incompatible. 

Let me clarify to help you understand the distinction.

Both HDFS and MR have 'framework' apis (such as details of NN/DN and JT/TT) and 'end user'
apis (such as open/read/write/close or Mapper/Reducer/InputFormat/OutputFormat etc., more
here: http://hadoop.apache.org/common/docs/stable/mapred_tutorial.html).

hadoop-0.23 aims to be 'compatible' for end-users so that they don't need to modify their
applications to use the new release. Also, we have both the 'old MR apis' and the 'new Context
Objects MR apis' in 0.23.

The 'framework' apis are a different ballgame since the underlying framework, particularly
in MR, has changed significantly. We have replaced the old JT/TT based 'classic' framework
with the new 'yarn' framework consisting of ResourceManager/NodeManager. There are similar,
but more subtle changes in the NameNode/DataNode for HDFS - then there is the append rewrite.
As a result, the wire-protocols have changed significantly - as a result we are bumping up
the 'major version' to reflect that.

The crux of the matter: end-user applications do NOT need to be _modified_, they just have
to be recompiled against the new libraries.

If you do see any reason for applications to be modified please open a jira and we'll ensure
we get it fixed asap. 

Have you seen any such instance?

> That's exactly why every single downstream component
> needs to be patched at the level of code to work against 0.23.

Now, a downstream project such as HBase, Hive or Pig isn't the 'normal end-user application'.
These projects can choose to use undocumented/non-public (e.g. LimitedPrivate) apis and we
are committed to working with them to ensure a smooth transition.

I don't know which are the ones in 'every single downstream component' - care to enumerate?

The ones I'm aware of, which have since been fixed are:
https://issues.apache.org/jira/browse/HBASE-4510 -> https://issues.apache.org/jira/browse/HDFS-2412
(we fixed the internal HDFS apis so that HBase can continue to use them)
https://issues.apache.org/jira/browse/PIG-2125 -> https://issues.apache.org/jira/browse/MAPREDUCE-3138
(we fixed MR to allow apps deal with inconsistency in 'new' MR apis which changed in 0.21).

I'm not aware of anything else - what else do you see?

As a result, the downstream projects ensure their own end-users and applications (HBase apps,
Pig scripts, Hive queries) etc. do NOT see any incompatibilities.


In summary, please take a careful look at the 'factual information' before you decide to proclaim
your beliefs about important aspects such as 'incompatibility' - it's key to ensure we don't
confuse end-users and have a smooth adoption of newer releases.


View raw message