hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Prasad GS <gsp200...@gmail.com>
Subject Re: RegionServer goes down in Compact SplitThread
Date Thu, 08 Aug 2013 13:25:47 GMT
Hi Jean,

Yes, I was able to restart the region server & this is the 1st time i am
seeing this issue.
Also, the split regions have been transitioned to another RS. But the
problem was that this RS was stuck & did not officially go down. Further it
also had the .META. table & therefore HBase became unusable. After the
restart, things are ok now. I always thought that the region split will
happen after a major compaction, but as per the logs, the compaction
request comes after a split.

Regards,
Skanda



On Thu, Aug 8, 2013 at 5:25 PM, Jean-Marc Spaggiari <jean-marc@spaggiari.org
> wrote:

> Hi Prasad,
>
> For a so old version it's a bit difficult to give some recommendations. Are
> you able to restart you RegionServer? Or it's stuck offline because of the
> issue it faced? Also, was it the first time you faced this issue? Looking
> at the stack trace, seems that the region server tried to open the same
> region twice, and at the same time, after a compaction or a split. Has this
> region been transitioned to another server now?
>
> JM
>
> 2013/8/8 Prasad GS <gsp200183@gmail.com>
>
> > Hi Jean,
> >
> > We are planning to move to the latest CDH version in a couple of months,
> > but until then we have to maintain the product with CDH3u5. If possible,
> > can you provide me with some pointers to look into this issue further?
> >
> > Regards,
> > Skanda
> >
> >
> > On Thu, Aug 8, 2013 at 3:57 PM, Jean-Marc Spaggiari <
> > jean-marc@spaggiari.org
> > > wrote:
> >
> > > Hi Prasad,
> > >
> > > 0.90.6 is a pretty old HBase version, and so CDH3u5 is a pretty old CDH
> > > version...
> > >
> > > Any chance to move to a more recent version?
> > >
> > > JM
> > >
> > > 2013/8/8 Prasad GS <gsp200183@gmail.com>
> > >
> > > > Hi,
> > > >
> > > > We are using Cloudera CDH3u5 distribution of HBase (0.90.6). The RS
> > goes
> > > > down suddenly & from the logs we see the following exception in the
> > > region
> > > > server :
> > > >
> > > > 2013-08-07 20:36:58,008 INFO
> > org.apache.hadoop.hbase.regionserver.Store:
> > > > Completed compaction of 18 file(s), new file=hdfs://
> > > >
> > > >
> > >
> >
> 192.168.0.29:9000/hbase/UsageHistoryMA/1f50c6795c7753315f1fbc04946753d1/d/3311452476716076182
> > > > ,
> > > > size=320.2m; total size for store is 320.2m
> > > > 2013-08-07 20:36:58,008 INFO
> > > org.apache.hadoop.hbase.regionserver.HRegion:
> > > > completed compaction on region
> > > > UsageHistoryMA,'u\x13\x07\x01\x00\x00\x00\x00
> > > > \x12u'X\x83,1375898307352.1f50c6795c7753315f1fbc04946753d1. after
> > 1mins,
> > > > 51sec
> > > > 2013-08-07 20:36:58,009 INFO
> > > > org.apache.hadoop.hbase.regionserver.SplitTransaction: Starting split
> > of
> > > > region UsageHistoryMA,'u\x13\x07\x01\x00\x00\x00\x00
> > > > \x12u'X\x83,1375898307352.1f50c6795c7753315f1fbc04946753d1.
> > > > 2013-08-07 20:36:58,010 DEBUG
> > > org.apache.hadoop.hbase.regionserver.HRegion:
> > > > Closing UsageHistoryMA,'u\x13\x07\x01\x00\x00\x00\x00
> > > > \x12u'X\x83,1375898307352.1f50c6795c7753315f1fbc04946753d1.:
> disabling
> > > > compactions & flushes
> > > > 2013-08-07 20:36:58,010 DEBUG
> > > org.apache.hadoop.hbase.regionserver.HRegion:
> > > > Updates disabled for region
> > UsageHistoryMA,'u\x13\x07\x01\x00\x00\x00\x00
> > > > \x12u'X\x83,1375898307352.1f50c6795c7753315f1fbc04946753d1.
> > > > 2013-08-07 20:36:58,010 DEBUG
> > org.apache.hadoop.hbase.regionserver.Store:
> > > > closed d
> > > > 2013-08-07 20:36:58,010 INFO
> > > org.apache.hadoop.hbase.regionserver.HRegion:
> > > > Closed UsageHistoryMA,'u\x13\x07\x01\x00\x00\x00\x00
> > > > \x12u'X\x83,1375898307352.1f50c6795c7753315f1fbc04946753d1.
> > > > 2013-08-07 20:36:58,029 DEBUG
> > > org.apache.hadoop.hbase.regionserver.HRegion:
> > > > Instantiated UsageHistoryMA,'u\x13\x07\x01\x00\x00\x00\x00
> > > > \x12u'X\x83,1375900618008.13150e07893adb4eded6d4dc98374e9e.
> > > > 2013-08-07 20:36:58,031 DEBUG
> > > org.apache.hadoop.hbase.regionserver.HRegion:
> > > > Instantiated UsageHistoryMA,'v\x13\x07\x01\x00\x00\x00\x00
> > > > \x12v`\x12\x15,1375900618008.6e9d9b93a9509909ed5c4d9e2bd321a8.
> > > > 2013-08-07 20:36:58,038 INFO
> > org.apache.hadoop.hbase.catalog.MetaEditor:
> > > > Offlined parent region UsageHistoryMA,'u\x13\x07\x01\x00\x00\x00\x00
> > > > \x12u'X\x83,1375898307352.1f50c6795c7753315f1fbc04946753d1. in META
> > > > 2013-08-07 20:36:58,085 DEBUG
> > org.apache.hadoop.hbase.regionserver.Store:
> > > > loaded hdfs://
> > > >
> > > >
> > >
> >
> 192.168.0.29:9000/hbase/UsageHistoryMA/6e9d9b93a9509909ed5c4d9e2bd321a8/d/3311452476716076182.1f50c6795c7753315f1fbc04946753d1
> > > > ,
> > > > isReference=true, isBulkLoadResult=false, seqid=26966370,
> > > > majorCompaction=false
> > > > 2013-08-07 20:36:58,087 INFO
> > > org.apache.hadoop.hbase.regionserver.HRegion:
> > > > Onlined UsageHistoryMA,'v\x13\x07\x01\x00\x00\x00\x00
> > > > \x12v`\x12\x15,1375900618008.6e9d9b93a9509909ed5c4d9e2bd321a8.; next
> > > > sequenceid=26966371
> > > > 2013-08-07 20:36:58,087 DEBUG
> > > > org.apache.hadoop.hbase.regionserver.CompactSplitThread: Compaction
> > > > requested for UsageHistoryMA,'v\x13\x07\x01\x00\x00\x00\x00
> > > > \x12v`\x12\x15,1375900618008.6e9d9b93a9509909ed5c4d9e2bd321a8.
> because
> > > > Region has references on open; priority=99, compaction queue size=18
> > > > 2013-08-07 20:36:58,092 INFO
> > org.apache.hadoop.hbase.catalog.MetaEditor:
> > > > Added daughter UsageHistoryMA,'v\x13\x07\x01\x00\x00\x00\x00
> > > > \x12v`\x12\x15,1375900618008.6e9d9b93a9509909ed5c4d9e2bd321a8. in
> > region
> > > > .META.,,1, serverInfo=dl360x2807,60020,1374636004119
> > > > 2013-08-07 20:36:58,093 INFO
> > > > org.apache.hadoop.hbase.regionserver.CompactSplitThread: Running
> > > > rollback/cleanup of failed split of
> > > > UsageHistoryMA,'u\x13\x07\x01\x00\x00\x00\x00
> > > > \x12u'X\x83,1375898307352.1f50c6795c7753315f1fbc04946753d1.; Failed
> > > >
> > > >
> > >
> >
> dl360x2807,60020,1374636004119-daughterOpener=13150e07893adb4eded6d4dc98374e9e
> > > >
> > > > java.io.IOException: Failed
> > > >
> > > >
> > >
> >
> dl360x2807,60020,1374636004119-daughterOpener=13150e07893adb4eded6d4dc98374e9e
> > > >
> > > >         at
> > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.SplitTransaction.execute(SplitTransaction.java:307)
> > > >
> > > >         at
> > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.CompactSplitThread.split(CompactSplitThread.java:205)
> > > >
> > > >         at
> > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.CompactSplitThread.run(CompactSplitThread.java:135)
> > > >
> > > > Caused by: java.util.ConcurrentModificationException
> > > >         at
> > > java.util.SubList.checkForComodification(AbstractList.java:752)
> > > >         at java.util.SubList.size(AbstractList.java:625)
> > > >         at java.util.AbstractList.add(AbstractList.java:91)
> > > >         at
> > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.monitoring.TaskMonitor.createStatus(TaskMonitor.java:75)
> > > >
> > > >         at
> > > >
> > org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:346)
> > > >         at
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:2860)
> > > >         at
> > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.SplitTransaction.openDaughterRegion(SplitTransaction.java:383)
> > > >
> > > >         at
> > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.SplitTransaction$DaughterOpener.run(SplitTransaction.java:352)
> > > >
> > > > 2013-08-07 20:36:58,112 FATAL
> > > > org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region
> > > server
> > > > serverName=dl360x2807,60020,1374636004119, load=(requests=91,
> > > regions=170,
> > > > usedHeap=7213, maxHeap=32730): Abort; we got an error after
> > > > point-of-no-return
> > > > 2013-08-07 20:36:58,113 INFO
> > > > org.apache.hadoop.hbase.regionserver.HRegionServer: Dump of metrics:
> > > > requests=30, regions=170, stores=171, storefiles=167,
> > > > storefileIndexSize=134, memstoreSize=187, mbInMemoryWithoutWAL=0,
> > > > numberOfPutsWithoutWAL=0, compactionQueueSize=17, flushQueueSize=0,
> > > > usedHeap=6992, maxHeap=32730, blockCacheSize=3028798008,
> > > > blockCacheFree=7267346888, blockCacheCount=51548,
> > > > blockCacheHitCount=55248138, blockCacheMissCount=3593839,
> > > > blockCacheEvictedCount=0, blockCacheHitRatio=93,
> > > > blockCacheHitCachingRatio=99
> > > > 2013-08-07 20:36:58,119 INFO
> > > > org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Abort;
> we
> > > got
> > > > an error after point-of-no-return
> > > > 2013-08-07 20:36:58,119 INFO
> > > > org.apache.hadoop.hbase.regionserver.CompactSplitThread:
> > > > regionserver60020.compactor exiting
> > > > 2013-08-07 20:36:59,161 INFO org.apache.hadoop.ipc.HBaseServer:
> > Stopping
> > > > server on 60020
> > > >
> > > > Could someone pls let me know as to why the region split failed &
why
> > the
> > > > RS went down. According to me, the ConcurrentModificationException
> > looks
> > > > really trivial.
> > > >
> > > >
> > > > Regards,
> > > > Prasad
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message