hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Todd Lipcon <t...@cloudera.com>
Subject Re: GC [ParNew...] took 299 secs causing region server to die
Date Fri, 30 Jul 2010 00:26:14 GMT
Agree with what JD said - also check for swapping on the machine. GC can
take forever if any of the Java heap gets swapped out, since GC by its
nature has to traverse most of the pages in the heap.

-Todd

On Thu, Jul 29, 2010 at 3:41 PM, Jean-Daniel Cryans <jdcryans@apache.org>wrote:

> Well it says; Times: user=0.17 sys=0.04, real=299.23 secs
>
> So why did it take 0.04 of system time but 300 secs of real time?
> That's insane. Either the region server process was completely starved
> of CPU cycles (are you on EC2 or any virtualized service like that?),
> or the computer was put to sleep. You have 4 CPUs and 6 processes
> using them (4 tasks, datanode, regionserver, and you should also count
> the OS itself), maybe you are overcommitting the available ressources?
>
> Also instead of importing via MR, you should consider using
>
> http://hbase.apache.org/docs/r0.20.5/api/org/apache/hadoop/hbase/mapreduce/HFileOutputFormat.html
>
> J-D
>
> On Thu, Jul 29, 2010 at 1:17 PM, Steve Kuo <kuosenhao@gmail.com> wrote:
> > I kept running into the stop-the-world GC during batch import of data
> into
> > hbase.  The configuration of a node in the 8-node cluster is as follows.
> >
> > * 4-core
> > * 64-bit JVM
> > * 8 GB of memory
> > * CDH2 for hadoop and 0.20.5 for hbase
> > * TT: 128 MB
> > * DN: 128 MB
> > * 2 Mappers at 512 MB each
> > * 2 Reducer at 512 MB each
> > * 1 regionserver at 4096 MB
> >
> > The import job was a mapper only job so that only TT, DN, 2 mappers and
> > regionserver were running.  Below is the JMX output for the dead
> > regionserver.
> >
> > Time:
> > 2010-07-29 12:25:47
> > Used:
> >   224,949 kbytes
> > Committed:
> >   670,728 kbytes
> > Max:
> >  4,185,792 kbytes
> > GC time:
> >     5 minutes on ParNew (2,126 collections)
> >  0.000 seconds on ConcurrentMarkSweep (0 collections)
> >
> > Clearly the regionserver was spent all GC time on ParNew, which was not
> > surprising as I was imported tons of data.  But I could not figure out
> why
> > the same GC that usually take way less than a second, took 299 secs at
> line
> > 3.  Any enlightenment is greatly appreciated.
> >
> > I will change ParNew to 6M as documented in Performance Tuning page and
> gave
> > it another shot.
> >
> > 010-07-28T12:06:57.249-0700: 2406.986: [GC 2406.986: [ParNew:
> > 17786K->755K(19136K), 0.0015410 secs] 348288K->331394K(620416K)
> icms_dc=27 ,
> > 0.0016330 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
> > 2010-07-28T12:06:57.268-0700: 2407.004: [GC 2407.004: [ParNew:
> > 17580K->761K(19136K), 0.0016710 secs] 348154K->331343K(620416K)
> icms_dc=27 ,
> > 0.0017610 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
> > 2010-07-28T12:06:57.288-0700: 2407.024: [GC 2407.088: [ParNew:
> > 17564K->757K(19136K), 299.1513910 secs] 348081K->331283K(620416K)
> icms_dc=27
> > , 299.1515120 secs] [Times: user=0.17 sys=0.04, real=299.23 secs]
> > 2010-07-28T12:11:56.558-0700: 2706.294: [GC 2706.294: [ParNew:
> > 17735K->925K(19136K), 0.0094600 secs] 348197K->331458K(620416K)
> icms_dc=27 ,
> > 0.0095670 secs] [Times: user=0.01 sys=0.00, real=0.01 secs]
> > 2010-07-28T12:11:56.606-0700: 2706.343: [GC 2706.343: [ParNew:
> > 17940K->932K(19136K), 0.0085750 secs] 348473K->331474K(620416K)
> icms_dc=27 ,
> > 0.0086710 secs] [Times: user=0.01 sys=0.00, real=0.01 secs]
> >
>



-- 
Todd Lipcon
Software Engineer, Cloudera

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message