Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 9D81D10A01 for ; Tue, 25 Mar 2014 23:05:04 +0000 (UTC) Received: (qmail 23909 invoked by uid 500); 25 Mar 2014 23:05:00 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 23832 invoked by uid 500); 25 Mar 2014 23:05:00 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 23824 invoked by uid 99); 25 Mar 2014 23:05:00 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 25 Mar 2014 23:05:00 +0000 X-ASF-Spam-Status: No, hits=3.2 required=5.0 tests=FREEMAIL_REPLY,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of m_garg@hotmail.com designates 65.54.190.95 as permitted sender) Received: from [65.54.190.95] (HELO bay0-omc2-s20.bay0.hotmail.com) (65.54.190.95) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 25 Mar 2014 23:04:54 +0000 Received: from BAY175-W29 ([65.54.190.123]) by bay0-omc2-s20.bay0.hotmail.com with Microsoft SMTPSVC(6.0.3790.4675); Tue, 25 Mar 2014 16:04:33 -0700 X-TMN: [PBxSBmoItXH6KeLybUwDQfW2axbjwvWq] X-Originating-Email: [m_garg@hotmail.com] Message-ID: Content-Type: multipart/alternative; boundary="_0192959e-a4e8-4b77-a4ad-04ba75e54fb3_" From: Mahesh Garg To: HBase Mailing List Subject: RE: HBase bulk load failing with RegionTooBusyException Date: Tue, 25 Mar 2014 16:04:33 -0700 Importance: Normal In-Reply-To: References: , MIME-Version: 1.0 X-OriginalArrivalTime: 25 Mar 2014 23:04:33.0919 (UTC) FILETIME=[962610F0:01CF487E] X-Virus-Checked: Checked by ClamAV on apache.org --_0192959e-a4e8-4b77-a4ad-04ba75e54fb3_ Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Yes. It took 132 seconds (from 03:11:26 -> 03:13:12). I also do see some wa= rnings saying "operationTooSlow".=20 Also=2C please note that the "major compaction" has been disabled. Here is the log. (Please excuse for length of the log. But=2C I hope it wil= l be helpful.) Thanks again. Garg ---------------------------------------------------------------------------= ----------------------------------------------- 2014-03-25 03:11:26=2C650 INFO org.apache.hadoop.hbase.regionserver.HRegion= : Starting compaction on cf2 in region table50=2C\x01\xD5UUUU_\x81\x814UUV\= x9ASR[\xD4\xB4UUUU=2C1395709055737.be366e9e6be9675af7be72baf29f03de. 2014-03-25 03:11:26=2C651 INFO org.apache.hadoop.hbase.regionserver.Store: = Starting compaction of 10 file(s) in cf2 of table50=2C\x01\xD5UUUU_\x81\x81= 4UUV\x9ASR[\xD4\xB4UUUU=2C1395709055737.be366e9e6be9675af7be72baf29f03de. i= nto tmpdir=3Dhdfs://AA.BB.CC.DD:8020/hbase/table50/be366e9e6be9675af7be72ba= f29f03de/.tmp=2C seqid=3D0=2C totalSize=3D327.5 M 2014-03-25 03:11:26=2C668 INFO org.apache.hadoop.hbase.regionserver.StoreFi= le: Delete Family Bloom filter type for hdfs://AA.BB.CC.DD:8020/hbase/table= 50/be366e9e6be9675af7be72baf29f03de/.tmp/5951be6151474f369126cd38fc4b8f60: = CompoundBloomFilterWriter 2014-03-25 03:11:41=2C029 WARN org.apache.hadoop.ipc.HBaseServer: (operatio= nTooSlow): {"processingtimems":25293=2C"client":"XX.XX.XX.XX:54862"=2C"time= Range":[0=2C9223372036854775807]=2C"starttimems":1395717075733=2C"responses= ize":0=2C"class":"HRegionServer"=2C"cacheBlocks":false=2C"table":"table50"= =2C"families":{"cf2":["ALL"]=2C"cf":["ALL"]}=2C"batch":1=2C"startRow":"\\x0= 5\\x80\\x00\\x00\\x00\\x00\\x1E\\x84\\x80\\xDF\\x00\\x00\\x01D\\xFD\\xFD\\x= 06\\x7F^\\xFF\\xFF\\xFF\\xFD"=2C"queuetimems":0=2C"stopRow":"\\x06j\\xAA\\x= AA\\xAA\\xAA\\xCEE@\\xC9\\xAA\\xAA\\xAB\\xEF\\xA8\\xA7\\xB1*\\x09\\xAA\\xAA= \\xAA\\xA7"=2C"method":"openScanner"=2C"totalColumns":2=2C"maxVersions":1= =2C"filter":"KeyOnlyFilter"=2C"caching":1} 2014-03-25 03:11:41=2C030 WARN org.apache.hadoop.ipc.HBaseServer: (operatio= nTooSlow): {"processingtimems":56342=2C"client":"XX.XX.XX.XX:54577"=2C"time= Range":[0=2C9223372036854775807]=2C"starttimems":1395717044685=2C"responses= ize":0=2C"class":"HRegionServer"=2C"cacheBlocks":false=2C"table":"table50"= =2C"families":{"cf2":["ALL"]=2C"cf":["ALL"]}=2C"batch":1=2C"startRow":"\\x0= 5\\x80\\x00\\x00\\x00\\x00\\x1E\\x84\\x80\\xDF\\x00\\x00\\x01D\\xFD\\xFD\\x= 06\\x7F^\\xFF\\xFF\\xFF\\xFD"=2C"queuetimems":0=2C"stopRow":"\\x06j\\xAA\\x= AA\\xAA\\xAA\\xCEE@\\xC9\\xAA\\xAA\\xAB\\xEF\\xA8\\xA7\\xB1*\\x09\\xAA\\xAA= \\xAA\\xA7"=2C"method":"openScanner"=2C"totalColumns":2=2C"maxVersions":1= =2C"filter":"KeyOnlyFilter"=2C"caching":1} 2014-03-25 03:12:12=2C085 INFO org.apache.hadoop.hbase.regionserver.HRegion= Server: Client tried to access missing scanner 7468301718262159439 (no leas= e) 2014-03-25 03:12:12=2C086 ERROR org.apache.hadoop.hbase.regionserver.HRegio= nServer: org.apache.hadoop.hbase.UnknownScannerException: No lease for 7468301718262= 159439: lease '7468301718262159439' does not exist at org.apache.hadoop.hbase.regionserver.HRegionServer.internalNext(= HRegionServer.java:2640) at org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionS= erver.java:2609) ...... 2014-03-25 03:12:12=2C088 ERROR org.apache.hadoop.hbase.regionserver.HRegio= nServer: org.apache.hadoop.hbase.regionserver.LeaseException: lease '746830171826215= 9439' does not exist at org.apache.hadoop.hbase.regionserver.Leases.removeLease(Leases.j= ava:231) at org.apache.hadoop.hbase.regionserver.Leases.cancelLease(Leases.j= ava:214) at org.apache.hadoop.hbase.regionserver.HRegionServer.internalClose= Scanner(HRegionServer.java:2749) at org.apache.hadoop.hbase.regionserver.HRegionServer.close(HRegion= Server.java:2723) ...... 2014-03-25 03:12:40=2C981 ERROR org.apache.hadoop.hbase.regionserver.HRegio= nServer: org.apache.hadoop.hbase.ipc.CallerDisconnectedException: Aborting on region= table50=2C\x05\x80\x00\x00\x00\x00\x1E\x84\x80\xDF\x00\x00\x01D\xFD\xFD\x0= 6\x7F^\xFF\xFF\xFF\xFD=2C1395709055737.5a772656accd69dd0354a09fc0c4415f.=2C= call next(7468301718262159439=2C 1)=2C rpc version=3D1=2C client version= =3D29=2C methodsFingerPrint=3D-1105746420 from XX.XX.XX.XX:54928 after 5993= 7 ms=2C since caller disconnected at org.apache.hadoop.hbase.ipc.HBaseServer$Call.throwExceptionIfCal= lerDisconnected(HBaseServer.java:437) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.n= extInternal(HRegion.java:3985) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.n= extRaw(HRegion.java:3891) ...... 2014-03-25 03:12:41=2C027 INFO org.apache.hadoop.hbase.regionserver.HRegion= Server: Scanner 9121928292667158981 lease expired on region table50=2C\x05\= x80\x00\x00\x00\x00\x1E\x84\x80\xDF\x00\x00\x01D\xFD\xFD\x06\x7F^\xFF\xFF\x= FF\xFD=2C1395709055737.5a772656accd69dd0354a09fc0c4415f. 2014-03-25 03:13:12=2C561 INFO org.apache.hadoop.hbase.regionserver.StoreFi= le: NO General Bloom and NO DeleteFamily was added to HFile (hdfs://AA.BB.C= C.DD:8020/hbase/table50/be366e9e6be9675af7be72baf29f03de/.tmp/5951be6151474= f369126cd38fc4b8f60) 2014-03-25 03:13:12=2C608 INFO org.apache.hadoop.hbase.regionserver.Store: = Renaming compacted file at hdfs://AA.BB.CC.DD:8020/hbase/table50/be366e9e6b= e9675af7be72baf29f03de/.tmp/5951be6151474f369126cd38fc4b8f60 to hdfs://AA.B= B.CC.DD:8020/hbase/table50/be366e9e6be9675af7be72baf29f03de/cf2/5951be61514= 74f369126cd38fc4b8f60 2014-03-25 03:13:12=2C696 INFO org.apache.hadoop.hbase.regionserver.Store: = Completed compaction of 10 file(s) in cf2 of table50=2C\x01\xD5UUUU_\x81\x8= 14UUV\x9ASR[\xD4\xB4UUUU=2C1395709055737.be366e9e6be9675af7be72baf29f03de. = into 5951be6151474f369126cd38fc4b8f60=2C size=3D112.3 M=3B total size for s= tore is 669.1 M 2014-03-25 03:13:12=2C696 INFO org.apache.hadoop.hbase.regionserver.compact= ions.CompactionRequest: completed compaction: regionName=3Dtable50=2C\x01\x= D5UUUU_\x81\x814UUV\x9ASR[\xD4\xB4UUUU=2C1395709055737.be366e9e6be9675af7be= 72baf29f03de.=2C storeName=3Dcf2=2C fileCount=3D10=2C fileSize=3D327.5 M=2C= priority=3D-19=2C time=3D1134091055662452=3B duration=3D1mins=2C 46sec ---------------------------------------------------------------------------= ----------------------------------------------------- > Date: Tue=2C 25 Mar 2014 15:03:16 -0700 > Subject: Re: HBase bulk load failing with RegionTooBusyException > From: yuzhihong@gmail.com > To: user@hbase.apache.org >=20 > Can you check region server log to see if there was concurrent compaction > going on for table50=2C\x05\x80\x00\x00\x00\x00\x1E\x84\x80\ > xDF\x00\x00\x01D\xFD\xFD\x06\x7F^\xFF\xFF\xFF\xFD=2C1395709055737. > 5a772656accd69dd0354a09fc0c4415f. ? >=20 > If so=2C how long did the compaction take ? >=20 > Cheers >=20 >=20 > On Tue=2C Mar 25=2C 2014 at 2:52 PM=2C Mahesh Garg w= rote: >=20 > > Hi=2C > > > > We are trying to bulk load into HBase. But=2C it it failing. We are usi= ng a > > very standard process: > > > > 1. Create HFiles using an MR job. > > 2. Call "completebulkload" to move HFiles into regions. > > > > The table is pre-split. We are using HBase version "0.94.15-cdh4.6.0". > > > > The load process runs in a loop and=2C about 10 million rows are loaded= in > > each attempt. However=2C the process fails in > > "completebulkload" with RegionTooBusyException. This error happen at > > different times after loading about 120 M rows. > > > > Any suggestion on how to avoid/debug this problem? Any parameters to tu= ne? > > The stack track is listed below. > > > > Thanks for your help! > > Garg > > > > > > 14/03/25 03:10:39 INFO mapreduce.LoadIncrementalHFiles: Trying to load > > hfile=3Dhdfs://XXX.XXX.XXX.XXX:8020/user/userName/outputdir/cf2/e2680ae= e969a47ae97c6980bbd76ea66 > > first=3D\x06\x00\x00\x00\x00\x00\x00\x00\x07_0_\x00\x00\x01D\xF73\xB7\x= 1F_\x00\x00\x00\x01 > > last=3D\x06\x00\x00\x00\x00\x00w[B_25_\x00\x00\x01D\xF74\xD7\xAC_\x00\x= 00\x00\x01 > > 14/03/25 03:21:50 ERROR mapreduce.LoadIncrementalHFiles: Encountered > > unrecoverable error from region server > > org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after > > attempts=3D10=2C exceptions: > > Tue Mar 25 03:11:39 UTC 2014=2C > > org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$3@29324479=2C > > org.apache.hadoop.hbase.RegionTooBusyException: > > org.apache.hadoop.hbase.RegionTooBusyException: failed to get a lock in > > 60000 ms. > > regionName=3Dtable50=2C\x05\x80\x00\x00\x00\x00\x1E\x84\x80\xDF\x00\x00= \x01D\xFD\xFD\x06\x7F^\xFF\xFF\xFF\xFD=2C1395709055737.5a772656accd69dd0354= a09fc0c4415f.=2C > > server=3DXXX.XXX.XXX.XXX=2C60020=2C1395216882104 > > at > > org.apache.hadoop.hbase.regionserver.HRegion.lock(HRegion.java:5898) > > at > > org.apache.hadoop.hbase.regionserver.HRegion.lock(HRegion.java:5884) > > at > > org.apache.hadoop.hbase.regionserver.HRegion.startBulkRegionOperation(H= Region.java:5843) > > at > > org.apache.hadoop.hbase.regionserver.HRegion.bulkLoadHFiles(HRegion.jav= a:3640) > > at > > org.apache.hadoop.hbase.regionserver.HRegion.bulkLoadHFiles(HRegion.jav= a:3623) > > at > > org.apache.hadoop.hbase.regionserver.HRegionServer.bulkLoadHFiles(HRegi= onServer.java:2978) > > at sun.reflect.GeneratedMethodAccessor61.invoke(Unknown Source) > > at > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccesso= rImpl.java:25) > > at java.lang.reflect.Method.invoke(Method.java:597) > > at > > org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRp > > > > = --_0192959e-a4e8-4b77-a4ad-04ba75e54fb3_--