hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "N.N. Gesli" <nnge...@gmail.com>
Subject Re: Map-Reduce in memory
Date Fri, 04 Nov 2011 06:46:02 GMT
Thank you very much for your replies.

Michel, disk is 3TB (6x550GB, 50 GB from each disk is reserved for local
basically for mapred.local.dir). You are right on the CPU; it is 8 core but
shows as 16. Is that mean it can handle 16 JVMs at a time? CPU is a little
overloaded, but that is not a huge problem at this point.

I made io.sort.factor 200 and io.sort.mb 2000. Still got the same
error/timeout. I played with all related conf settings one by one. At last,
changing mapred.job.shuffle.merge.percent from 1.0 back to 0.66 solved the
problem.

However, the job is still taking long time. There are 84 reducers, but only
one of them takes a very long time. I attached the log file of that reduce
task. Majority of the data gets spilled to disk. Even if I set
mapred.child.java.opts
to 6144, the reduce task log shows

ShuffleRamManager: MemoryLimit=1503238528, MaxSingleShuffleLimit=375809632

as if memory is 2GB (70% of 2GB=1503238528b). In the same log file
later there is also this line:

INFO ExecReducer: maximum memory = 6414139392

I am not using memory monitoring. Tasktrackers have this line in the log:

TaskTracker's totalMemoryAllottedForTasks is -1. TaskMemoryManager is disabled.

Why is ShuffleRamManager is finding that number as if the max memory is 2GB?
Why am I still getting that much spill even with these aggressive memory
settings?
Why only one reducer taking that long?
What else I can change to make this job processed in the memory and finish
faster?

Thank you.
-N.N.Gesli

On Fri, Oct 28, 2011 at 2:14 AM, Michel Segel <michael_segel@hotmail.com>wrote:

> Uhm...
> He has plenty of memory... Depending on what sort of m/r tasks... He could
> push it.
> Didn't say how much disk...
>
> I wouldn't start that high... Try 10 mappers and 2. Reducers. Granted it
> is a bit asymmetric and you can bump up the reducers...
>
> Watch your jobs in ganglia and see what is happening...
>
> Harsh, assuming he is using intel, each core is hyper threaded so the box
> sees this as 2x CPUs.
> 8 cores looks like 16.
>
>
> Sent from a remote device. Please excuse any typos...
>
> Mike Segel
>
> On Oct 28, 2011, at 3:08 AM, Harsh J <harsh@cloudera.com> wrote:
>
> > Hey N.N. Gesli,
> >
> > (Inline)
> >
> > On Fri, Oct 28, 2011 at 12:38 PM, N.N. Gesli <nngesli@gmail.com> wrote:
> >> Hello,
> >>
> >> We have 12 node Hadoop Cluster that is running Hadoop 0.20.2-cdh3u0.
> Each
> >> node has 8 core and 144GB RAM (don't ask). So, I want to take advantage
> of
> >> this huge RAM and run the map-reduce jobs mostly in memory with no
> spill, if
> >> possible. We use Hive for most of the processes. I have set:
> >> mapred.tasktracker.map.tasks.maximum = 16
> >> mapred.tasktracker.reduce.tasks.maximum = 8
> >
> > This is *crazy* for an 8 core machine. Try to keep M+R slots well
> > below 8 instead - You're probably CPU-thrashed in this setup once
> > large number of tasks get booted.
> >
> >> mapred.child.java.opts = 6144
> >
> > You can also raise io.sort.mb to 2000, and tweak io.sort.factor.
> >
> > The child opts raise to 6~ GB looks a bit unnecessary since most of
> > your tasks work on record basis and would not care much about total
> > RAM. Perhaps use all that RAM for a service like HBase which can
> > leverage caching nicely!
> >
> >> One of my Hive queries is producing 6 stage map-reduce jobs. On the
> third
> >> stage when it queries from a 200GB table, the last 14 reducers hang. I
> >> changed mapred.task.timeout to 0 to see if they really hang. It has
> been 5
> >> hours, so something terribly wrong in my setup. Parts of the log is
> below.
> >
> > It is probably just your slot settings. You may be massively
> > over-subscribing your CPU resources with 16 map task slots + 8 reduce
> > tasks slots. At worst case, it would mean 24 total JVMs competing over
> > 8 available physical processors. Doesn't make sense to me at least -
> > Make it more like 7 M / 2 R or so :)
> >
> >> My questions:
> >> * What should be my configurations to make reducers to run in the
> memory?
> >> * Why it keeps waiting for map outputs?
> >
> > It has to fetch map outputs to get some data to start with. And it
> > pulls the map outputs a few at a time - to not overload the network
> > during shuffle phases of several reducers across the cluster.
> >
> >> * What does it mean "dup hosts"?
> >
> > Duplicate hosts. Hosts it already knows about and has already
> > scheduled fetch work upon.
> >
> > <snip>
> >
> > --
> > Harsh J
> >
>

Mime
View raw message