hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Marc Spaggiari <jean-m...@spaggiari.org>
Subject Re: RegionServer goes down in Compact SplitThread
Date Thu, 08 Aug 2013 11:55:55 GMT
Hi Prasad,

For a so old version it's a bit difficult to give some recommendations. Are
you able to restart you RegionServer? Or it's stuck offline because of the
issue it faced? Also, was it the first time you faced this issue? Looking
at the stack trace, seems that the region server tried to open the same
region twice, and at the same time, after a compaction or a split. Has this
region been transitioned to another server now?

JM

2013/8/8 Prasad GS <gsp200183@gmail.com>

> Hi Jean,
>
> We are planning to move to the latest CDH version in a couple of months,
> but until then we have to maintain the product with CDH3u5. If possible,
> can you provide me with some pointers to look into this issue further?
>
> Regards,
> Skanda
>
>
> On Thu, Aug 8, 2013 at 3:57 PM, Jean-Marc Spaggiari <
> jean-marc@spaggiari.org
> > wrote:
>
> > Hi Prasad,
> >
> > 0.90.6 is a pretty old HBase version, and so CDH3u5 is a pretty old CDH
> > version...
> >
> > Any chance to move to a more recent version?
> >
> > JM
> >
> > 2013/8/8 Prasad GS <gsp200183@gmail.com>
> >
> > > Hi,
> > >
> > > We are using Cloudera CDH3u5 distribution of HBase (0.90.6). The RS
> goes
> > > down suddenly & from the logs we see the following exception in the
> > region
> > > server :
> > >
> > > 2013-08-07 20:36:58,008 INFO
> org.apache.hadoop.hbase.regionserver.Store:
> > > Completed compaction of 18 file(s), new file=hdfs://
> > >
> > >
> >
> 192.168.0.29:9000/hbase/UsageHistoryMA/1f50c6795c7753315f1fbc04946753d1/d/3311452476716076182
> > > ,
> > > size=320.2m; total size for store is 320.2m
> > > 2013-08-07 20:36:58,008 INFO
> > org.apache.hadoop.hbase.regionserver.HRegion:
> > > completed compaction on region
> > > UsageHistoryMA,'u\x13\x07\x01\x00\x00\x00\x00
> > > \x12u'X\x83,1375898307352.1f50c6795c7753315f1fbc04946753d1. after
> 1mins,
> > > 51sec
> > > 2013-08-07 20:36:58,009 INFO
> > > org.apache.hadoop.hbase.regionserver.SplitTransaction: Starting split
> of
> > > region UsageHistoryMA,'u\x13\x07\x01\x00\x00\x00\x00
> > > \x12u'X\x83,1375898307352.1f50c6795c7753315f1fbc04946753d1.
> > > 2013-08-07 20:36:58,010 DEBUG
> > org.apache.hadoop.hbase.regionserver.HRegion:
> > > Closing UsageHistoryMA,'u\x13\x07\x01\x00\x00\x00\x00
> > > \x12u'X\x83,1375898307352.1f50c6795c7753315f1fbc04946753d1.: disabling
> > > compactions & flushes
> > > 2013-08-07 20:36:58,010 DEBUG
> > org.apache.hadoop.hbase.regionserver.HRegion:
> > > Updates disabled for region
> UsageHistoryMA,'u\x13\x07\x01\x00\x00\x00\x00
> > > \x12u'X\x83,1375898307352.1f50c6795c7753315f1fbc04946753d1.
> > > 2013-08-07 20:36:58,010 DEBUG
> org.apache.hadoop.hbase.regionserver.Store:
> > > closed d
> > > 2013-08-07 20:36:58,010 INFO
> > org.apache.hadoop.hbase.regionserver.HRegion:
> > > Closed UsageHistoryMA,'u\x13\x07\x01\x00\x00\x00\x00
> > > \x12u'X\x83,1375898307352.1f50c6795c7753315f1fbc04946753d1.
> > > 2013-08-07 20:36:58,029 DEBUG
> > org.apache.hadoop.hbase.regionserver.HRegion:
> > > Instantiated UsageHistoryMA,'u\x13\x07\x01\x00\x00\x00\x00
> > > \x12u'X\x83,1375900618008.13150e07893adb4eded6d4dc98374e9e.
> > > 2013-08-07 20:36:58,031 DEBUG
> > org.apache.hadoop.hbase.regionserver.HRegion:
> > > Instantiated UsageHistoryMA,'v\x13\x07\x01\x00\x00\x00\x00
> > > \x12v`\x12\x15,1375900618008.6e9d9b93a9509909ed5c4d9e2bd321a8.
> > > 2013-08-07 20:36:58,038 INFO
> org.apache.hadoop.hbase.catalog.MetaEditor:
> > > Offlined parent region UsageHistoryMA,'u\x13\x07\x01\x00\x00\x00\x00
> > > \x12u'X\x83,1375898307352.1f50c6795c7753315f1fbc04946753d1. in META
> > > 2013-08-07 20:36:58,085 DEBUG
> org.apache.hadoop.hbase.regionserver.Store:
> > > loaded hdfs://
> > >
> > >
> >
> 192.168.0.29:9000/hbase/UsageHistoryMA/6e9d9b93a9509909ed5c4d9e2bd321a8/d/3311452476716076182.1f50c6795c7753315f1fbc04946753d1
> > > ,
> > > isReference=true, isBulkLoadResult=false, seqid=26966370,
> > > majorCompaction=false
> > > 2013-08-07 20:36:58,087 INFO
> > org.apache.hadoop.hbase.regionserver.HRegion:
> > > Onlined UsageHistoryMA,'v\x13\x07\x01\x00\x00\x00\x00
> > > \x12v`\x12\x15,1375900618008.6e9d9b93a9509909ed5c4d9e2bd321a8.; next
> > > sequenceid=26966371
> > > 2013-08-07 20:36:58,087 DEBUG
> > > org.apache.hadoop.hbase.regionserver.CompactSplitThread: Compaction
> > > requested for UsageHistoryMA,'v\x13\x07\x01\x00\x00\x00\x00
> > > \x12v`\x12\x15,1375900618008.6e9d9b93a9509909ed5c4d9e2bd321a8. because
> > > Region has references on open; priority=99, compaction queue size=18
> > > 2013-08-07 20:36:58,092 INFO
> org.apache.hadoop.hbase.catalog.MetaEditor:
> > > Added daughter UsageHistoryMA,'v\x13\x07\x01\x00\x00\x00\x00
> > > \x12v`\x12\x15,1375900618008.6e9d9b93a9509909ed5c4d9e2bd321a8. in
> region
> > > .META.,,1, serverInfo=dl360x2807,60020,1374636004119
> > > 2013-08-07 20:36:58,093 INFO
> > > org.apache.hadoop.hbase.regionserver.CompactSplitThread: Running
> > > rollback/cleanup of failed split of
> > > UsageHistoryMA,'u\x13\x07\x01\x00\x00\x00\x00
> > > \x12u'X\x83,1375898307352.1f50c6795c7753315f1fbc04946753d1.; Failed
> > >
> > >
> >
> dl360x2807,60020,1374636004119-daughterOpener=13150e07893adb4eded6d4dc98374e9e
> > >
> > > java.io.IOException: Failed
> > >
> > >
> >
> dl360x2807,60020,1374636004119-daughterOpener=13150e07893adb4eded6d4dc98374e9e
> > >
> > >         at
> > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.SplitTransaction.execute(SplitTransaction.java:307)
> > >
> > >         at
> > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.CompactSplitThread.split(CompactSplitThread.java:205)
> > >
> > >         at
> > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.CompactSplitThread.run(CompactSplitThread.java:135)
> > >
> > > Caused by: java.util.ConcurrentModificationException
> > >         at
> > java.util.SubList.checkForComodification(AbstractList.java:752)
> > >         at java.util.SubList.size(AbstractList.java:625)
> > >         at java.util.AbstractList.add(AbstractList.java:91)
> > >         at
> > >
> > >
> >
> org.apache.hadoop.hbase.monitoring.TaskMonitor.createStatus(TaskMonitor.java:75)
> > >
> > >         at
> > >
> org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:346)
> > >         at
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:2860)
> > >         at
> > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.SplitTransaction.openDaughterRegion(SplitTransaction.java:383)
> > >
> > >         at
> > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.SplitTransaction$DaughterOpener.run(SplitTransaction.java:352)
> > >
> > > 2013-08-07 20:36:58,112 FATAL
> > > org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region
> > server
> > > serverName=dl360x2807,60020,1374636004119, load=(requests=91,
> > regions=170,
> > > usedHeap=7213, maxHeap=32730): Abort; we got an error after
> > > point-of-no-return
> > > 2013-08-07 20:36:58,113 INFO
> > > org.apache.hadoop.hbase.regionserver.HRegionServer: Dump of metrics:
> > > requests=30, regions=170, stores=171, storefiles=167,
> > > storefileIndexSize=134, memstoreSize=187, mbInMemoryWithoutWAL=0,
> > > numberOfPutsWithoutWAL=0, compactionQueueSize=17, flushQueueSize=0,
> > > usedHeap=6992, maxHeap=32730, blockCacheSize=3028798008,
> > > blockCacheFree=7267346888, blockCacheCount=51548,
> > > blockCacheHitCount=55248138, blockCacheMissCount=3593839,
> > > blockCacheEvictedCount=0, blockCacheHitRatio=93,
> > > blockCacheHitCachingRatio=99
> > > 2013-08-07 20:36:58,119 INFO
> > > org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Abort; we
> > got
> > > an error after point-of-no-return
> > > 2013-08-07 20:36:58,119 INFO
> > > org.apache.hadoop.hbase.regionserver.CompactSplitThread:
> > > regionserver60020.compactor exiting
> > > 2013-08-07 20:36:59,161 INFO org.apache.hadoop.ipc.HBaseServer:
> Stopping
> > > server on 60020
> > >
> > > Could someone pls let me know as to why the region split failed & why
> the
> > > RS went down. According to me, the ConcurrentModificationException
> looks
> > > really trivial.
> > >
> > >
> > > Regards,
> > > Prasad
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message