hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Zhijie Shen <zs...@hortonworks.com>
Subject Re: MapReduce V1 vs MapReduce V2
Date Fri, 03 Jan 2014 19:25:08 GMT
Hi Matt,

in my opinion, the basic difference between MapReduce V1 and V2 is not
about mapred or mapreduce API package, but about the platform to run the
job. When it was MapReduce V1, the job was managed by JobTracker and
TaskTracker. After upgrading to  MapReduce V2, the resource management part
in the MapReduce project has been spun off, and evolves to ben YARN, a
generic distributed resource management system. MapReduce as well as other
types of applications can run on the common platform. On the other side,
the remaining part, which is code base of MapReduce V2, is a pure
distributed computation framework.

With regard to the API packages, both mapred.* and mapreduce.* have been
existing since MapReduce V1, but mapreduce.* has been involving a lot.  If
you're writing a new MapReduce application referring to the latest Hadoop
libraries, it's MapReduce V2 no matter whether you're using mapred.* or
mapreduce.*. If you already has some MapReduce applications that were built
with MapReduce V1 framework, and use mapred.* APIs, they are supposed to be
run on YARN without problems. However, it those applications use
mapreduce.* APIs, you may need to compile them MapReduce V2 framework to be
able to run them on YARN.

Here're a bunch of resources that you may want to have a look for further


On Fri, Jan 3, 2014 at 2:19 AM, Matt Fellows <
matt.fellows@bespokesoftware.com> wrote:

> I'm thoroughly confused about which API is the recent one, which is the
> old one and which method I should be using to write MapReduce applications.
> I'm under the impression that MRv2 is primarily driven by the
> org.apache.hadoop.mapreduce.* packages and MRv1 is primarily driven by the
> org.apache.hadoop.mapred.* packages.
> I've been led to believe that MRv2 applications extend MapReduceBase and
> implement Mapper, Reducer etc.
> and conversely the MRv1 applications extend Mapper, Reducer directly.
> However I can not find a canonical statement to back any of this up.
>  What's more I keep finding conflicting statements about these, such as
> "'Hadoop - the definitive guide' gives example in MRv2 format" but then I
> look at the examples and they use org.apache.hadoop.mapreduce.* packages,
> but extend Mapper and extend Reducer, not MapReduceBase...
> Can someone either point me at a canonical resource or just confirm / deny
> my assumptions?
> Kind regards
> --
> [image: cid:1CBF4038-3F0F-4FC2-A1FF-6DC81B8B6F94]
>  First Option Software Ltd
> Signal House
> Jacklyns Lane
> Alresford
> SO24 9JJ
> Tel: +44 (0)1962 738232
> Mob: +44 (0)7710 160458
> Fax: +44 (0)1962 600112
> Web: www.b <http://www.fosolutions.co.uk/>espokesoftware.com<http://bespokesoftware.com/>
> ____________________________________________________
> This is confidential, non-binding and not company endorsed - see full
> terms at www.fosolutions.co.uk/emailpolicy.html
> First Option Software Ltd Registered No. 06340261
> Signal House, Jacklyns Lane, Alresford, Hampshire, SO24 9JJ, U.K.
> ____________________________________________________

Zhijie Shen
Hortonworks Inc.

NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message