Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B59341089D for ; Thu, 8 Aug 2013 07:31:59 +0000 (UTC) Received: (qmail 62199 invoked by uid 500); 8 Aug 2013 07:31:56 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 61823 invoked by uid 500); 8 Aug 2013 07:31:55 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 61812 invoked by uid 99); 8 Aug 2013 07:31:55 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 08 Aug 2013 07:31:55 +0000 X-ASF-Spam-Status: No, hits=1.8 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,HTML_MESSAGE,NORMAL_HTTP_TO_IP,RCVD_IN_DNSWL_LOW,SPF_PASS,WEIRD_PORT X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of gsp200183@gmail.com designates 74.125.82.67 as permitted sender) Received: from [74.125.82.67] (HELO mail-wg0-f67.google.com) (74.125.82.67) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 08 Aug 2013 07:31:50 +0000 Received: by mail-wg0-f67.google.com with SMTP id z12so700975wgg.10 for ; Thu, 08 Aug 2013 00:31:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=pf+maQ6QzvB0cNQ3os23+1gdp+ogEjVLDd+5W1+wH58=; b=XhDX0a41imfkqytPcHwY0YidF/alKuGCRQknOUCV152L73vgFOu/lLNpghf9YK0fqM W7Hiql0dk2E8n6oMs0G1KoIK5oXJQiukRIMUwJID/yqX9hMb0U25hMHZinpq8Uh4kA3P 5ooqTnffDUlsWDuC4gPq+/yqb75m8SJU2dHgq0c09NUaugxIR98LFV2EvZ9ujzLPaIJq 9ZpP/W4ZBuH/vQO2dOinrP849KOvotXDaSF2JnXOECC1/WrxdJ82sZVmAG9BkLVOIcRa Xb4fMeJormcrLvGPtkcR/C86a6WjAkdOkWLAHa7gnziHe/8eiwloxxQMFUHW85iGX9+v G8WA== MIME-Version: 1.0 X-Received: by 10.180.84.196 with SMTP id b4mr4333615wiz.19.1375947088833; Thu, 08 Aug 2013 00:31:28 -0700 (PDT) Received: by 10.194.1.129 with HTTP; Thu, 8 Aug 2013 00:31:28 -0700 (PDT) Date: Thu, 8 Aug 2013 13:01:28 +0530 Message-ID: Subject: RegionServer goes down in CompactSplitThread From: Prasad GS To: user@hbase.apache.org Content-Type: multipart/alternative; boundary=f46d04426e2a75b27004e36aa530 X-Virus-Checked: Checked by ClamAV on apache.org --f46d04426e2a75b27004e36aa530 Content-Type: text/plain; charset=ISO-8859-1 Hi, We are using Cloudera CDH3u5 distribution of HBase (0.90.6). The RS goes down suddenly & from the logs we see the following exception in the region server : 2013-08-07 20:36:58,008 INFO org.apache.hadoop.hbase.regionserver.Store: Completed compaction of 18 file(s), new file=hdfs:// 192.168.0.29:9000/hbase/UsageHistoryMA/1f50c6795c7753315f1fbc04946753d1/d/3311452476716076182, size=320.2m; total size for store is 320.2m 2013-08-07 20:36:58,008 INFO org.apache.hadoop.hbase.regionserver.HRegion: completed compaction on region UsageHistoryMA,'u\x13\x07\x01\x00\x00\x00\x00 \x12u'X\x83,1375898307352.1f50c6795c7753315f1fbc04946753d1. after 1mins, 51sec 2013-08-07 20:36:58,009 INFO org.apache.hadoop.hbase.regionserver.SplitTransaction: Starting split of region UsageHistoryMA,'u\x13\x07\x01\x00\x00\x00\x00 \x12u'X\x83,1375898307352.1f50c6795c7753315f1fbc04946753d1. 2013-08-07 20:36:58,010 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Closing UsageHistoryMA,'u\x13\x07\x01\x00\x00\x00\x00 \x12u'X\x83,1375898307352.1f50c6795c7753315f1fbc04946753d1.: disabling compactions & flushes 2013-08-07 20:36:58,010 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Updates disabled for region UsageHistoryMA,'u\x13\x07\x01\x00\x00\x00\x00 \x12u'X\x83,1375898307352.1f50c6795c7753315f1fbc04946753d1. 2013-08-07 20:36:58,010 DEBUG org.apache.hadoop.hbase.regionserver.Store: closed d 2013-08-07 20:36:58,010 INFO org.apache.hadoop.hbase.regionserver.HRegion: Closed UsageHistoryMA,'u\x13\x07\x01\x00\x00\x00\x00 \x12u'X\x83,1375898307352.1f50c6795c7753315f1fbc04946753d1. 2013-08-07 20:36:58,029 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Instantiated UsageHistoryMA,'u\x13\x07\x01\x00\x00\x00\x00 \x12u'X\x83,1375900618008.13150e07893adb4eded6d4dc98374e9e. 2013-08-07 20:36:58,031 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Instantiated UsageHistoryMA,'v\x13\x07\x01\x00\x00\x00\x00 \x12v`\x12\x15,1375900618008.6e9d9b93a9509909ed5c4d9e2bd321a8. 2013-08-07 20:36:58,038 INFO org.apache.hadoop.hbase.catalog.MetaEditor: Offlined parent region UsageHistoryMA,'u\x13\x07\x01\x00\x00\x00\x00 \x12u'X\x83,1375898307352.1f50c6795c7753315f1fbc04946753d1. in META 2013-08-07 20:36:58,085 DEBUG org.apache.hadoop.hbase.regionserver.Store: loaded hdfs:// 192.168.0.29:9000/hbase/UsageHistoryMA/6e9d9b93a9509909ed5c4d9e2bd321a8/d/3311452476716076182.1f50c6795c7753315f1fbc04946753d1, isReference=true, isBulkLoadResult=false, seqid=26966370, majorCompaction=false 2013-08-07 20:36:58,087 INFO org.apache.hadoop.hbase.regionserver.HRegion: Onlined UsageHistoryMA,'v\x13\x07\x01\x00\x00\x00\x00 \x12v`\x12\x15,1375900618008.6e9d9b93a9509909ed5c4d9e2bd321a8.; next sequenceid=26966371 2013-08-07 20:36:58,087 DEBUG org.apache.hadoop.hbase.regionserver.CompactSplitThread: Compaction requested for UsageHistoryMA,'v\x13\x07\x01\x00\x00\x00\x00 \x12v`\x12\x15,1375900618008.6e9d9b93a9509909ed5c4d9e2bd321a8. because Region has references on open; priority=99, compaction queue size=18 2013-08-07 20:36:58,092 INFO org.apache.hadoop.hbase.catalog.MetaEditor: Added daughter UsageHistoryMA,'v\x13\x07\x01\x00\x00\x00\x00 \x12v`\x12\x15,1375900618008.6e9d9b93a9509909ed5c4d9e2bd321a8. in region .META.,,1, serverInfo=dl360x2807,60020,1374636004119 2013-08-07 20:36:58,093 INFO org.apache.hadoop.hbase.regionserver.CompactSplitThread: Running rollback/cleanup of failed split of UsageHistoryMA,'u\x13\x07\x01\x00\x00\x00\x00 \x12u'X\x83,1375898307352.1f50c6795c7753315f1fbc04946753d1.; Failed dl360x2807,60020,1374636004119-daughterOpener=13150e07893adb4eded6d4dc98374e9e java.io.IOException: Failed dl360x2807,60020,1374636004119-daughterOpener=13150e07893adb4eded6d4dc98374e9e at org.apache.hadoop.hbase.regionserver.SplitTransaction.execute(SplitTransaction.java:307) at org.apache.hadoop.hbase.regionserver.CompactSplitThread.split(CompactSplitThread.java:205) at org.apache.hadoop.hbase.regionserver.CompactSplitThread.run(CompactSplitThread.java:135) Caused by: java.util.ConcurrentModificationException at java.util.SubList.checkForComodification(AbstractList.java:752) at java.util.SubList.size(AbstractList.java:625) at java.util.AbstractList.add(AbstractList.java:91) at org.apache.hadoop.hbase.monitoring.TaskMonitor.createStatus(TaskMonitor.java:75) at org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:346) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:2860) at org.apache.hadoop.hbase.regionserver.SplitTransaction.openDaughterRegion(SplitTransaction.java:383) at org.apache.hadoop.hbase.regionserver.SplitTransaction$DaughterOpener.run(SplitTransaction.java:352) 2013-08-07 20:36:58,112 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server serverName=dl360x2807,60020,1374636004119, load=(requests=91, regions=170, usedHeap=7213, maxHeap=32730): Abort; we got an error after point-of-no-return 2013-08-07 20:36:58,113 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Dump of metrics: requests=30, regions=170, stores=171, storefiles=167, storefileIndexSize=134, memstoreSize=187, mbInMemoryWithoutWAL=0, numberOfPutsWithoutWAL=0, compactionQueueSize=17, flushQueueSize=0, usedHeap=6992, maxHeap=32730, blockCacheSize=3028798008, blockCacheFree=7267346888, blockCacheCount=51548, blockCacheHitCount=55248138, blockCacheMissCount=3593839, blockCacheEvictedCount=0, blockCacheHitRatio=93, blockCacheHitCachingRatio=99 2013-08-07 20:36:58,119 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Abort; we got an error after point-of-no-return 2013-08-07 20:36:58,119 INFO org.apache.hadoop.hbase.regionserver.CompactSplitThread: regionserver60020.compactor exiting 2013-08-07 20:36:59,161 INFO org.apache.hadoop.ipc.HBaseServer: Stopping server on 60020 Could someone pls let me know as to why the region split failed & why the RS went down. According to me, the ConcurrentModificationException looks really trivial. Regards, Prasad --f46d04426e2a75b27004e36aa530--