hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tianying Chang <tych...@gmail.com>
Subject Re: WALPlayer kills many RS when play large number of WALs
Date Tue, 22 Jul 2014 18:32:58 GMT
Andrew

You are right. These are m1.xlarge instance for our hbasetest cluster. Our
production cluster mostly use i2 instance. I will do some math and config
it accordingly to prevent these problem.

I reduced the tasktracker count and memory used to 1G and I will see
if/when the Java Heap Space failure show up again.

Thanks  a lot for your tips.

Thanks
Tian-Ying


On Tue, Jul 22, 2014 at 10:20 AM, Andrew Purtell <apurtell@apache.org>
wrote:

> > The node has only 15G
> ​
> memory.
> ​​
> ​​
>
> EC2 m1.xlarge or m3.xlarge ? You might find some of the new types with more
> memory have better price-performance value. If you are on EC2 and are
> colocating mapreduce with HBase, you'll want more RAM *and* vCPU I think.
>
> > But will that cause Java Heap Space problem for the mapreduce job when
> the WALPlayer reducer running?
> ​​
>
> If you add "-XX:+HeapDumpOnOutOfMemoryError" to '
> mapred.child.java.opts' then there might be a retrievable heap dump left
> around on the worker nodes. Not sure the precise location offhand, pardon,
> it's been a while since I've debugged mapreduce. You can then use jhat to
> analyze the heap dump. The types of the top 10 or 20 most frequently
> allocated objects would be interesting.
>
>
> On Tue, Jul 22, 2014 at 9:58 AM, Tianying Chang <tychang@gmail.com> wrote:
>
> > Andrew
> >
> > Thanks for your answer! I think you are right. The node has only 15G
> > memory. We configured it to run RS with 12G. And then we configured 4
> > mapper and 4 reducer on each node, each to use 2G memory.  So that
> probably
> > caused RS being killed by OOM.
> > *mapred.child.java.opts*-Xmx2048m
> > I have another question.  If I change the mapper/reducer per node to 1
> and
> > lower the mapred.child.java.opts to 512M, I think that will prevent RS
> > being killed due to OOM. But will that cause Java Heap Space problem for
> > the mapreduce job when the WALPlayer reducer running? Since I saw some
> post
> > that to fix the Java Heap space error for mapreduce job, they are
> > recommended to increase the mapred.child.java.opts to higher.
> >
> > http://stackoverflow.com/questions/8464048/out-of-memory-error-in-hadoop
> >
> > Thanks
> > Tian-Ying
> >
> >
> >
> >
> >
> > On Tue, Jul 22, 2014 at 9:14 AM, Andrew Purtell <apurtell@apache.org>
> > wrote:
> >
> > > Accidentally hit send too soon.
> > > ​
> > > ​
> > >  A good rule of thumb is the aggregate of all Java heaps (daemons like
> > > DataNOde, RegionServer, NodeManager, etc. + the max allowed number of
> > > mapreduce jobs * task heap setting) ... should fit into available RAM.
> > >
> > > If you don't have enough available RAM, then you need to take steps to
> > > reduce resource consumption. Limit the allowed number of concurrent
> > > mapreduce tasks. Reduce the heap size specified in
> > > 'mapred.child.java.opts'. Or both.  ​
> > >
> > >
> > > On Tue, Jul 22, 2014 at 9:12 AM, Andrew Purtell <apurtell@apache.org>
> > > wrote:
> > >
> > > > You need to better manage the colocation of the mapreduce runtime. In
> > > > other words, you are allowing mapreduce to grab too many node
> > resources,
> > > > resulting in activation of the kernel's OOM killer.
> > > > ​​
> > > > A good rule of thumb is the aggregate of all Java heaps (daemons like
> > > > DataNOde, RegionServer, NodeManager, etc. + the max allowed number of
> > > > mapreduce jobs * task heap setting). Reduce the allowed mapreduce
> task
> > > > concurrency.
> > > >
> > > >
> > > > On Tue, Jul 22, 2014 at 8:15 AM, Tianying Chang <tychang@gmail.com>
> > > wrote:
> > > >
> > > >> Hi
> > > >>
> > > >> I was running WALPlayer that output HFile for future bulkload. There
> > are
> > > >> 6200 hlogs, and the total size is about 400G.
> > > >>
> > > >> The mapreduce job finished. But I saw two bad things:
> > > >> 1. More than half of RS died. I checked the syslog, it seems they
> are
> > > >> killed by OOM. They also have very high CPU spike for the whole time
> > > >> during
> > > >> WALPlayer
> > > >>
> > > >> cpu user usage of 84.4% matches resource limit [cpu user
> usage>70.0%]
> > > >>
> > > >> 2. Mapreduce job also has failure of Java heap Space error. My job
> set
> > > the
> > > >> heap usage as 2G,
> > > >> *mapred.child.java.opts*-Xmx2048m
> > > >> Does this mean WALPlayer cannot support this load on this kind of
> > > setting?
> > > >>
> > > >> Thanks
> > > >> Tian-Ying
> > > >>
> > > >
> > > >
> > > >
> > > > --
> > > > Best regards,
> > > >
> > > >    - Andy
> > > >
> > > > Problems worthy of attack prove their worth by hitting back. - Piet
> > Hein
> > > > (via Tom White)
> > > >
> > >
> > >
> > >
> > > --
> > > Best regards,
> > >
> > >    - Andy
> > >
> > > Problems worthy of attack prove their worth by hitting back. - Piet
> Hein
> > > (via Tom White)
> > >
> >
>
>
>
> --
> Best regards,
>
>    - Andy
>
> Problems worthy of attack prove their worth by hitting back. - Piet Hein
> (via Tom White)
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message