hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Prasad GS <gsp200...@gmail.com>
Subject RegionServer goes down in Compact SplitThread
Date Thu, 08 Aug 2013 06:36:21 GMT
Hi,

We are using Cloudera CDH3u5 distribution of HBase (0.90.6). The RS goes
down suddenly & from the logs we see the following exception in the region
server :

2013-08-07 20:36:58,008 INFO org.apache.hadoop.hbase.regionserver.Store:
Completed compaction of 18 file(s), new file=hdfs://
192.168.0.29:9000/hbase/UsageHistoryMA/1f50c6795c7753315f1fbc04946753d1/d/3311452476716076182,
size=320.2m; total size for store is 320.2m
2013-08-07 20:36:58,008 INFO org.apache.hadoop.hbase.regionserver.HRegion:
completed compaction on region
UsageHistoryMA,'u\x13\x07\x01\x00\x00\x00\x00
\x12u'X\x83,1375898307352.1f50c6795c7753315f1fbc04946753d1. after 1mins,
51sec
2013-08-07 20:36:58,009 INFO
org.apache.hadoop.hbase.regionserver.SplitTransaction: Starting split of
region UsageHistoryMA,'u\x13\x07\x01\x00\x00\x00\x00
\x12u'X\x83,1375898307352.1f50c6795c7753315f1fbc04946753d1.
2013-08-07 20:36:58,010 DEBUG org.apache.hadoop.hbase.regionserver.HRegion:
Closing UsageHistoryMA,'u\x13\x07\x01\x00\x00\x00\x00
\x12u'X\x83,1375898307352.1f50c6795c7753315f1fbc04946753d1.: disabling
compactions & flushes
2013-08-07 20:36:58,010 DEBUG org.apache.hadoop.hbase.regionserver.HRegion:
Updates disabled for region UsageHistoryMA,'u\x13\x07\x01\x00\x00\x00\x00
\x12u'X\x83,1375898307352.1f50c6795c7753315f1fbc04946753d1.
2013-08-07 20:36:58,010 DEBUG org.apache.hadoop.hbase.regionserver.Store:
closed d
2013-08-07 20:36:58,010 INFO org.apache.hadoop.hbase.regionserver.HRegion:
Closed UsageHistoryMA,'u\x13\x07\x01\x00\x00\x00\x00
\x12u'X\x83,1375898307352.1f50c6795c7753315f1fbc04946753d1.
2013-08-07 20:36:58,029 DEBUG org.apache.hadoop.hbase.regionserver.HRegion:
Instantiated UsageHistoryMA,'u\x13\x07\x01\x00\x00\x00\x00
\x12u'X\x83,1375900618008.13150e07893adb4eded6d4dc98374e9e.
2013-08-07 20:36:58,031 DEBUG org.apache.hadoop.hbase.regionserver.HRegion:
Instantiated UsageHistoryMA,'v\x13\x07\x01\x00\x00\x00\x00
\x12v`\x12\x15,1375900618008.6e9d9b93a9509909ed5c4d9e2bd321a8.
2013-08-07 20:36:58,038 INFO org.apache.hadoop.hbase.catalog.MetaEditor:
Offlined parent region UsageHistoryMA,'u\x13\x07\x01\x00\x00\x00\x00
\x12u'X\x83,1375898307352.1f50c6795c7753315f1fbc04946753d1. in META
2013-08-07 20:36:58,085 DEBUG org.apache.hadoop.hbase.regionserver.Store:
loaded hdfs://
192.168.0.29:9000/hbase/UsageHistoryMA/6e9d9b93a9509909ed5c4d9e2bd321a8/d/3311452476716076182.1f50c6795c7753315f1fbc04946753d1,
isReference=true, isBulkLoadResult=false, seqid=26966370,
majorCompaction=false
2013-08-07 20:36:58,087 INFO org.apache.hadoop.hbase.regionserver.HRegion:
Onlined UsageHistoryMA,'v\x13\x07\x01\x00\x00\x00\x00
\x12v`\x12\x15,1375900618008.6e9d9b93a9509909ed5c4d9e2bd321a8.; next
sequenceid=26966371
2013-08-07 20:36:58,087 DEBUG
org.apache.hadoop.hbase.regionserver.CompactSplitThread: Compaction
requested for UsageHistoryMA,'v\x13\x07\x01\x00\x00\x00\x00
\x12v`\x12\x15,1375900618008.6e9d9b93a9509909ed5c4d9e2bd321a8. because
Region has references on open; priority=99, compaction queue size=18
2013-08-07 20:36:58,092 INFO org.apache.hadoop.hbase.catalog.MetaEditor:
Added daughter UsageHistoryMA,'v\x13\x07\x01\x00\x00\x00\x00
\x12v`\x12\x15,1375900618008.6e9d9b93a9509909ed5c4d9e2bd321a8. in region
.META.,,1, serverInfo=dl360x2807,60020,1374636004119
2013-08-07 20:36:58,093 INFO
org.apache.hadoop.hbase.regionserver.CompactSplitThread: Running
rollback/cleanup of failed split of
UsageHistoryMA,'u\x13\x07\x01\x00\x00\x00\x00
\x12u'X\x83,1375898307352.1f50c6795c7753315f1fbc04946753d1.; Failed
dl360x2807,60020,1374636004119-daughterOpener=13150e07893adb4eded6d4dc98374e9e

java.io.IOException: Failed
dl360x2807,60020,1374636004119-daughterOpener=13150e07893adb4eded6d4dc98374e9e

        at
org.apache.hadoop.hbase.regionserver.SplitTransaction.execute(SplitTransaction.java:307)

        at
org.apache.hadoop.hbase.regionserver.CompactSplitThread.split(CompactSplitThread.java:205)

        at
org.apache.hadoop.hbase.regionserver.CompactSplitThread.run(CompactSplitThread.java:135)

Caused by: java.util.ConcurrentModificationException
        at java.util.SubList.checkForComodification(AbstractList.java:752)
        at java.util.SubList.size(AbstractList.java:625)
        at java.util.AbstractList.add(AbstractList.java:91)
        at
org.apache.hadoop.hbase.monitoring.TaskMonitor.createStatus(TaskMonitor.java:75)

        at
org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:346)
        at
org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:2860)
        at
org.apache.hadoop.hbase.regionserver.SplitTransaction.openDaughterRegion(SplitTransaction.java:383)

        at
org.apache.hadoop.hbase.regionserver.SplitTransaction$DaughterOpener.run(SplitTransaction.java:352)

2013-08-07 20:36:58,112 FATAL
org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server
serverName=dl360x2807,60020,1374636004119, load=(requests=91, regions=170,
usedHeap=7213, maxHeap=32730): Abort; we got an error after
point-of-no-return
2013-08-07 20:36:58,113 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Dump of metrics:
requests=30, regions=170, stores=171, storefiles=167,
storefileIndexSize=134, memstoreSize=187, mbInMemoryWithoutWAL=0,
numberOfPutsWithoutWAL=0, compactionQueueSize=17, flushQueueSize=0,
usedHeap=6992, maxHeap=32730, blockCacheSize=3028798008,
blockCacheFree=7267346888, blockCacheCount=51548,
blockCacheHitCount=55248138, blockCacheMissCount=3593839,
blockCacheEvictedCount=0, blockCacheHitRatio=93,
blockCacheHitCachingRatio=99
2013-08-07 20:36:58,119 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Abort; we got
an error after point-of-no-return
2013-08-07 20:36:58,119 INFO
org.apache.hadoop.hbase.regionserver.CompactSplitThread:
regionserver60020.compactor exiting
2013-08-07 20:36:59,161 INFO org.apache.hadoop.ipc.HBaseServer: Stopping
server on 60020

Could someone pls let me know as to why the region split failed & why the
RS went down. According to me, the ConcurrentModificationException looks
really trivial.


Regards,
Prasad

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message