hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: Force one mapper per machine (not core)?
Date Sat, 01 Feb 2014 00:46:39 GMT
If it's job tracker you use, it's MR1.
On Feb 1, 2014 12:23 AM, "Keith Wiley" <kwiley@keithwiley.com> wrote:

> Hmmm, okay.  I know it's running CDH4 4.4.0, as but for whether it was
> specifically configured with MR1 or MR2 (is there a distinction between MR2
> and Yarn?) I'm not absolutely certain.  I know that the cluster "behaves"
> like the MR1 clusters I've worked with for years (I interact with the job
> tracker in a classical way for example).  Can I tell whether it's MR1 or
> MR2 from the job tracker or namename web UIs?
> Thanks.
> On Jan 29, 2014, at 00:52 , Harsh J wrote:
> > Is your cluster running MR1 or MR2? On MR1, the CapacityScheduler
> > would allow you to do this if you used appropriate memory based
> > requests (see http://search-hadoop.com/m/gnFs91yIg1e), and on MR2
> > (depending on the YARN scheduler resource request limits config) you
> > can request your job be run with the maximum-most requests that would
> > soak up all provided resources (of CPU and Memory) of a node such that
> > only one container runs on a host at any given time.
> >
> > On Wed, Jan 29, 2014 at 3:30 AM, Keith Wiley <kwiley@keithwiley.com>
> wrote:
> >> I'm running a program which in the streaming layer automatically
> multithreads and does so by automatically detecting the number of cores on
> the machine.  I realize this model is somewhat in conflict with Hadoop, but
> nonetheless, that's what I'm doing.  Thus, for even resource utilization,
> it would be nice to not only assign one mapper per core, but only one
> mapper per machine.  I realize that if I saturate the cluster none of this
> really matters, but consider the following example for clarity: 4-core
> nodes, 10-node cluster, thus 40 slots, fully configured across mappers and
> reducers (40 slots of each).  Say I run this program with just two mappers.
>  It would run much more efficiently (in essentially half the time) if I
> could force the two mappers to go to slots on two separate machines instead
> of running the risk that Hadoop may assign them both to the same machine.
> >>
> >> Can this be done?
> >>
> >> Thanks.
> ________________________________________________________________________________
> Keith Wiley     kwiley@keithwiley.com     keithwiley.com
> music.keithwiley.com
> "I used to be with it, but then they changed what it was.  Now, what I'm
> with
> isn't it, and what's it seems weird and scary to me."
>                                            --  Abe (Grandpa) Simpson
> ________________________________________________________________________________

View raw message