hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adi <adi.pan...@gmail.com>
Subject Re: Suggestions for swapping issue
Date Wed, 11 May 2011 18:30:37 GMT
Actually per node 56 + 12 = 68 slots(not mappers/reducers).
With the jobs configuration it was using 6 slots per mapper(resulting in 8-9
mappers), 6 slot per reducer( 1 reducer).
There was mistake in my earlier mails. The map slots are 56 not 48, but
still total memory allocation for hadoop comes to around 35-36GB.

-Adi



On Wed, May 11, 2011 at 2:16 PM, Ted Dunning <tdunning@maprtech.com> wrote:

> How is it that 36 processes are not expected if you have configured 48 + 12
> = 50 slots available on the machine?
>
> On Wed, May 11, 2011 at 11:11 AM, Adi <adi.pandit@gmail.com> wrote:
>
> > By our calculations hadoop should not exceed 70% of memory.
> > Allocated per node - 48 map slots (24 GB) ,  12 reduce slots (6 GB), 1 GB
> > each for DataNode/TaskTracker and one JobTracker Totalling 33/34 GB
> > allocation.
> > The queues are capped at using only 90% of capacity allocated so
> generally
> > 10% of slots are always kept free.
> >
> > The cluster was running total 33 mappers and 1 reducer so around 8-9
> > mappers
> > per node with 3 GB max limit and they were utilizing around 2GB each.
> > Top was showing 100% memory utilized. Which our sys admin says is ok as
> the
> > memory is used for file caching by linux if the processes are not using
> it.
> > No swapping on 3 nodes.
> > Then node4 just started swapping after the number of processes shot up
> > unexpectedly. The main mystery are these excess number of processes on
> the
> > node which went down. 36 as opposed to expected 11. The other 3 nodes
> were
> > successfully executing the mappers without any memory/swap issues.
> >
> > -Adi
> >
> > On Wed, May 11, 2011 at 1:40 PM, Michel Segel <michael_segel@hotmail.com
> > >wrote:
> >
> > > You have to do the math...
> > > If you have 2gb per mapper, and run 10 mappers per node... That means
> > 20gb
> > > of memory.
> > > Then you have TT and DN running which also take memory...
> > >
> > > What did you set as the number of mappers/reducers per node?
> > >
> > > What do you see in ganglia or when you run top?
> > >
> > > Sent from a remote device. Please excuse any typos...
> > >
> > > Mike Segel
> > >
> > > On May 11, 2011, at 12:31 PM, Adi <adi.pandit@gmail.com> wrote:
> > >
> > > > Hello Hadoop Gurus,
> > > > We are running a 4-node cluster. We just upgraded the RAM to 48 GB.
> We
> > > have
> > > > allocated around 33-34 GB per node for hadoop processes. Leaving the
> > rest
> > > of
> > > > the 14-15 GB memory for OS and as buffer. There are no other
> processes
> > > > running on these nodes.
> > > > Most of the lighter jobs run successfully but one big job is
> > > de-stabilizing
> > > > the cluster. One node starts swapping and runs out of swap space and
> > goes
> > > > offline. We tracked the processes on that node and noticed that it
> ends
> > > up
> > > > with more than expected hadoop-java processes.
> > > > The other 3 nodes were running 10 or 11 processes and this node ends
> up
> > > with
> > > > 36. After killing the job we find these processes still show up and
> we
> > > have
> > > > to kill them manually.
> > > > We have tried reducing the swappiness to 6 but saw the same results.
> It
> > > also
> > > > looks like hadoop stays well within the memory limits allocated and
> > still
> > > > starts swapping.
> > > >
> > > > Some other suggestions we have seen are:
> > > > 1) Increase swap size. Current size is 6 GB. The most quoted size is
> > > 'tons
> > > > of swap' but note sure how much it translates to in numbers. Should
> it
> > be
> > > 16
> > > > or 24 GB
> > > > 2) Increase overcommit ratio. Not sure if this helps as a few blog
> > > comments
> > > > mentioned it didn't help
> > > >
> > > > Any other hadoop or linux config suggestions are welcome.
> > > >
> > > > Thanks.
> > > >
> > > > -Adi
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message