Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 85F9718B72 for ; Thu, 25 Jun 2015 11:10:05 +0000 (UTC) Received: (qmail 52383 invoked by uid 500); 25 Jun 2015 11:10:05 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 52337 invoked by uid 500); 25 Jun 2015 11:10:05 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 52058 invoked by uid 99); 25 Jun 2015 11:10:05 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 25 Jun 2015 11:10:05 +0000 Date: Thu, 25 Jun 2015 11:10:05 +0000 (UTC) From: "ramkrishna.s.vasudevan (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HBASE-13970) NPE during compaction in trunk MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-13970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14601010#comment-14601010 ] ramkrishna.s.vasudevan commented on HBASE-13970: ------------------------------------------------ The reason here is that there are 2 compactions getting triggered from the CompactionSplitThread on a region that is newly split. One may be due to the split that happened? So when both the compaction goes parallely then the PressureAwareCompactionController is started by the Compactor thread {code} @Override public void start(String compactionName) { activeCompactions.put(compactionName, new ActiveCompaction()); } {code} Now by the time the second compaction is in progress the first compaction completes and does a finish {code} @Override public void finish(String compactionName) { ActiveCompaction compaction = activeCompactions.remove(compactionName); long elapsedTime = Math.max(1, EnvironmentEdgeManager.currentTime() - compaction.startTime); ..... {code} The compactionName is going to be common here because {code} String compactionName = store.getRegionInfo().getRegionNameAsString() + "#" + store.getFamily().getNameAsString(); {code} When the second compaction comes for completion the compaction has already been removed and hence we get an NPE. Logs One compaction started with 4 files {code} 2015-06-25 22:07:49,135 INFO [regionserver/stobdtserver3/10.224.54.70:16040-shortCompactions-1435250269126] regionserver.HRegion: Starting compaction on info in region TestTable,00000000000000000000283887,1435250266343.5eb54f001fd85035ab448f44d049ab84. 2015-06-25 22:07:49,135 INFO [regionserver/stobdtserver3/10.224.54.70:16040-shortCompactions-1435250269126] regionserver.HStore: Starting compaction of 4 file(s) in info of TestTable,00000000000000000000283887,1435250266343.5eb54f001fd85035ab448f44d049ab84. into tmpdir=hdfs://stobdtserver3:9010/hbase/data/default/TestTable/5eb54f001fd85035ab448f44d049ab84/.tmp, totalSize=285.6 M 2015-06-25 22:07:49,165 INFO [regionserver/stobdtserver3/10.224.54.70:16040-shortCompactions-1435250269126] hfile.CacheConfig: blockCache=org.apache.hadoop.hbase.io.hfile.CombinedBlockCache@71f1ce16, cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, prefetchOnOpen=false 2015-06-25 22:07:49,954 INFO [regionserver/stobdtserver3/10.224.54.70:16040.logRoller] wal.FSHLog: Rolled WAL /hbase/WALs/stobdtserver3,16040,1435250244539/stobdtserver3%2C16040%2C1435250244539.default.1435250268365 with entries=90, filesize=124.14 MB; new WAL /hbase/WALs/stobdtserver3,16040,1435250244539/stobdtserver3%2C16040%2C1435250244539.default.1435250269933 20 {code} Another compaction has been started with 3 files {code} 2015-06-25 22:07:53,405 INFO [regionserver/stobdtserver3/10.224.54.70:16040-longCompactions-1435250245998] regionserver.HRegion: Starting compaction on info in region TestTable,00000000000000000000283887,1435250266343.5eb54f001fd85035ab448f44d049ab84. 2015-06-25 22:07:53,406 INFO [regionserver/stobdtserver3/10.224.54.70:16040-longCompactions-1435250245998] regionserver.HStore: Starting compaction of 3 file(s) in info of TestTable,00000000000000000000283887,1435250266343.5eb54f001fd85035ab448f44d049ab84. into tmpdir=hdfs://stobdtserver3:9010/hbase/data/default/TestTable/5eb54f001fd85035ab448f44d049ab84/.tmp, totalSize=343.4 M 2015-06-25 22:07:53,411 INFO [regionserver/stobdtserver3/10.224.54.70:16040-longCompactions-1435250245998] hfile.CacheConfig: blockCache=org.apache.hadoop.hbase.io.hfile.CombinedBlockCache@71f1ce16, cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, prefetchOnOpen=false 2015-06-25 22:07:54,211 INFO [MemStoreFlusher.1] regionserver.HRegion: Flushing 1/1 column families, memstore=128.23 MB 2015-06-25 22:07:54,639 INFO [regionserver/stobdtserver3/10.224.54.70:16040.logRoller] wal.FSHLog: Rolled WAL /hbase/WALs/stobdtserver3,16040,1435250244539/stobdtserver3%2C16040%2C1435250244539.default.1435250273034 with entries=90, filesize= {code} {code} 2015-06-25 22:08:09,446 INFO [regionserver/stobdtserver3/10.224.54.70:16040-shortCompactions-1435250269126] compactions.PressureAwareCompactionThroughputController: TestTable,00000000000000000000283887,1435250266343.5eb54f001fd85035ab448f44d049ab84.#info average throughput is 19.80 MB/sec, slept 30 time(s) and total slept time is 27694 ms. 0 active compactions remaining, total limit is 10.00 MB/sec 2015-06-25 22:08:09,520 INFO [regionserver/stobdtserver3/10.224.54.70:16040-shortCompactions-1435250269126] regionserver.HStore: Completed compaction of 4 (all) file(s) in info of TestTable,00000000000000000000283887,1435250266343.5eb54f001fd85035ab448f44d049ab84. into c044bd80b8684d97bb8ed2552687d22a(size=228.4 M), total size for store is 686.2 M. This selection was in queue for 0sec, and took 20sec to execute. 2015-06-25 22:08:09,523 INFO [regionserver/stobdtserver3/10.224.54.70:16040-shortCompactions-1435250269126] regionserver.CompactSplitThread: Completed compaction: Request = regionName=TestTable,00000000000000000000283887,1435250266343.5eb54f001fd85035ab448f44d049ab84., storeName=info, fileCount=4, fileSize=285.6 M, priority=6, time=7539031296069589; duration=20sec 2015-06-25 22:08:09,858 ERROR [regionserver/stobdtserver3/10.224.54.70:16040-longCompactions-1435250245998] regionserver.CompactSplitThread: Compaction failed Request = regionName=TestTable,00000000000000000000283887,1435250266343.5eb54f001fd85035ab448f44d049ab84., storeName=info, fileCount=3, fileSize=343.4 M (114.5 M, 114.5 M, 114.5 M), priority=3, time=7539035566509573 java.lang.NullPointerException at org.apache.hadoop.hbase.regionserver.compactions.PressureAwareCompactionThroughputController$ActiveCompaction.access$700(PressureAwareCompactionThroughputController.java:79) at org.apache.hadoop.hbase.regionserver.compactions.PressureAwareCompactionThroughputController.finish(PressureAwareCompactionThroughputController.java:238) at org.apache.hadoop.hbase.regionserver.compactions.Compactor.performCompaction(Compactor.java:306) at org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.compact(DefaultCompactor.java:106) at org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.compact(DefaultStoreEngine.java:112) at org.apache.hadoop.hbase.regionserver.HStore.compact(HStore.java:1202) at org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:1792) at org.apache.hadoop.hbase.regionserver.CompactSplitThread$CompactionRunner.run(CompactSplitThread.java:524) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 2015-06-25 22:12:26,339 INFO [LruBlockCacheStatsExecutor] hfile.LruBlockCache: totalSize=1.83 MB, freeSize=1.74 GB, max=1.74 GB, blockCount=0, accesses=0, hits=0, hitRatio=0, cachingAccesses=0, cachingHits=0, cachingHitsRatio=0,evictions=2 {code} Both gets completed around the same time and hence you can see that the compaction with 4 files was successful but the one with 3 got failed with NPE. Though the compaction was actually done but it got stopped due to NPE. This is a race condition - so the failed compacted files may be get compacated in the next run of compaction when being selected. > NPE during compaction in trunk > ------------------------------ > > Key: HBASE-13970 > URL: https://issues.apache.org/jira/browse/HBASE-13970 > Project: HBase > Issue Type: Bug > Affects Versions: 2.0.0 > Reporter: ramkrishna.s.vasudevan > Assignee: ramkrishna.s.vasudevan > Fix For: 2.0.0 > > > Updated the trunk.. Loaded the table with PE tool. Trigger a flush to ensure all data is flushed out to disk. When the first compaction is triggered we get an NPE and this is very easy to reproduce > {code} > 015-06-25 21:33:46,041 INFO [main-EventThread] procedure.ZKProcedureMemberRpcs: Received procedure start children changed event: /hbase/flush-table-proc/acquired > 2015-06-25 21:33:46,051 INFO [rs(stobdtserver3,16040,1435248182301)-flush-proc-pool3-thread-1] regionserver.HRegion: Flushing 1/1 column families, memstore=76.91 MB > 2015-06-25 21:33:46,159 ERROR [regionserver/stobdtserver3/10.224.54.70:16040-longCompactions-1435248183945] regionserver.CompactSplitThread: Compaction failed Request = regionName=TestTable,00000000000000000000283887,1435248198798.028fb0324cd6eb03d5022eb8c147b7c4., storeName=info, fileCount=3, fileSize=343.4 M (114.5 M, 114.5 M, 114.5 M), priority=3, time=7536968291719985 > java.lang.NullPointerException > at org.apache.hadoop.hbase.regionserver.compactions.PressureAwareCompactionThroughputController$ActiveCompaction.access$700(PressureAwareCompactionThroughputController.java:79) > at org.apache.hadoop.hbase.regionserver.compactions.PressureAwareCompactionThroughputController.finish(PressureAwareCompactionThroughputController.java:238) > at org.apache.hadoop.hbase.regionserver.compactions.Compactor.performCompaction(Compactor.java:306) > at org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.compact(DefaultCompactor.java:106) > at org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.compact(DefaultStoreEngine.java:112) > at org.apache.hadoop.hbase.regionserver.HStore.compact(HStore.java:1202) > at org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:1792) > at org.apache.hadoop.hbase.regionserver.CompactSplitThread$CompactionRunner.run(CompactSplitThread.java:524) > at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > 2015-06-25 21:33:46,745 INFO [rs(stobdtserver3,16040,1435248182301)-flush-proc-pool3-thread-1] regionserver.DefaultStoreFlusher: Flushed, sequenceid=1534, memsize=76.9 M, hasBloomFilter=true, into tmp file hdfs://stobdtserver3:9010/hbase/data/default/TestTable/028fb0324cd6eb03d5022eb8c147b7c4/.tmp/942ba0831a0047a08987439e34361a0c > 2015-06-25 21:33:46,772 INFO [rs(stobdtserver3,16040,1435248182301)-flush-proc-pool3-thread-1] regionserver.HStore: Added hdfs://stobdtserver3:9010/hbase/data/default/TestTable/028fb0324cd6eb03d5022eb8c147b7c4/info/942ba0831a0047a08987439e34361a0c, entries=68116, sequenceid=1534, filesize=68.7 M > 2015-06-25 21:33:46,773 INFO [rs(stobdtserver3,16040,1435248182301)-flush-proc-pool3-thread-1] regionserver.HRegion: Finished memstore flush of ~76.91 MB/80649344, currentsize=0 B/0 for region TestTable,00000000000000000000283887,1435248198798.028fb0324cd6eb03d5022eb8c147b7c4. in 723ms, sequenceid=1534, compaction requested=true > 2015-06-25 21:33:46,780 INFO [main-EventThread] procedure.ZKProcedureMemberRpcs: Received created event:/hbase/flush-table-proc/reached/TestTable > 2015-06-25 21:33:46,790 INFO [main-EventThread] procedure.ZKProcedureMemberRpcs: Received created event:/hbase/flush-table-proc/abort/TestTable > 2015-06-25 21:33:46,791 INFO [main-EventThread] procedure.ZKProcedureMemberRpcs: Received procedure abort children changed event: /hbase/flush-table-proc/abort > 2015-06-25 21:33:46,803 INFO [main-EventThread] procedure.ZKProcedureMemberRpcs: Received procedure start children changed event: /hbase/flush-table-proc/acquired > 2015-06-25 21:33:46,818 INFO [main-EventThread] procedure.ZKProcedureMemberRpcs: Received procedure abort children changed event: /hbase/flush-table-proc/abort > {code} > Will check this on what is the reason behind it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)