Return-Path: Delivered-To: apmail-hadoop-hbase-dev-archive@minotaur.apache.org Received: (qmail 64890 invoked from network); 22 Jun 2009 03:30:22 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 22 Jun 2009 03:30:22 -0000 Received: (qmail 74137 invoked by uid 500); 22 Jun 2009 03:30:33 -0000 Delivered-To: apmail-hadoop-hbase-dev-archive@hadoop.apache.org Received: (qmail 74059 invoked by uid 500); 22 Jun 2009 03:30:33 -0000 Mailing-List: contact hbase-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hbase-dev@hadoop.apache.org Delivered-To: mailing list hbase-dev@hadoop.apache.org Received: (qmail 74049 invoked by uid 99); 22 Jun 2009 03:30:33 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 22 Jun 2009 03:30:33 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 22 Jun 2009 03:30:29 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id D05D3234C052 for ; Sun, 21 Jun 2009 20:30:07 -0700 (PDT) Message-ID: <615723635.1245641407852.JavaMail.jira@brutus> Date: Sun, 21 Jun 2009 20:30:07 -0700 (PDT) From: "ryan rawson (JIRA)" To: hbase-dev@hadoop.apache.org Subject: [jira] Commented: (HBASE-1560) TIF (and other clients?) cant seem to find one region (getClosestRowBefore issue?) In-Reply-To: <1315873836.1245637567338.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HBASE-1560?page=3Dcom.atlassian= .jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D1272= 2466#action_12722466 ]=20 ryan rawson commented on HBASE-1560: ------------------------------------ this looks like a TIF issue: 2009-06-21 20:18:31,590 DEBUG org.apache.hadoop.hbase.client.HConnectionMan= ager$TableServers: Got ZooKeeper event, state: SyncConnected, type: None, p= ath: null 2009-06-21 20:18:31,608 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWr= apper: Read ZNode /hbase/root-region-server got 10.20.20.165:60020 2009-06-21 20:18:31,646 DEBUG org.apache.hadoop.hbase.client.HConnectionMan= ager$TableServers: Found ROOT at 10.20.20.165:60020 2009-06-21 20:18:31,656 DEBUG org.apache.hadoop.hbase.client.HTable$ClientS= canner: Creating scanner over table_name starting at key '^@^R^?c^@^@^A^V= =EF=BF=BDv=EF=BF=BD@^@B=EF=BF=BD=EF=BF=BD' 2009-06-21 20:18:31,656 DEBUG org.apache.hadoop.hbase.client.HTable$ClientS= canner: Advancing internal scanner to startKey at '^@^R^?c^@^@^A^V=EF=BF=BD= v=EF=BF=BD@^@B=EF=BF=BD=EF=BF=BD' 2009-06-21 20:18:31,671 INFO org.apache.hadoop.mapred.MapTask: numReduceTas= ks: 1 2009-06-21 20:18:31,679 INFO org.apache.hadoop.mapred.MapTask: io.sort.mb = =3D 100 2009-06-21 20:18:31,801 INFO org.apache.hadoop.mapred.MapTask: data buffer = =3D 79691776/99614720 2009-06-21 20:18:31,801 INFO org.apache.hadoop.mapred.MapTask: record buffe= r =3D 262144/327680 2009-06-21 20:19:31,874 DEBUG org.apache.hadoop.hbase.mapred.TableInputForm= atBase: recovered from org.apache.hadoop.hbase.UnknownScannerException: org= .apache.hadoop.hbase.UnknownScan nerException: 7976142196877173570 at org.apache.hadoop.hbase.regionserver.HRegionServer.close(HRegion= Server.java:1905) at sun.reflect.GeneratedMethodAccessor19.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethod= AccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:6= 43) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.= java:913) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Me= thod) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeCons= tructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Delega= tingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteExcep= tion(RemoteExceptionHandler.java:94) at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.g= etRegionServerWithRetries(HConnectionManager.java:928) at org.apache.hadoop.hbase.client.HTable$ClientScanner.nextScanner(= HTable.java:1809) at org.apache.hadoop.hbase.client.HTable$ClientScanner.next(HTable.= java:1904) at org.apache.hadoop.hbase.mapred.TableInputFormatBase$TableRecordR= eader.next(TableInputFormatBase.java:219) at org.apache.hadoop.hbase.mapred.TableInputFormatBase$TableRecordR= eader.next(TableInputFormatBase.java:90) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(= MapTask.java:191) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTas= k.java:175) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:356) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) at org.apache.hadoop.mapred.Child.main(Child.java:170) 2009-06-21 20:19:31,874 DEBUG org.apache.hadoop.hbase.client.HTable$ClientS= canner: Creating scanner over table_name starting at key 'null' 2009-06-21 20:19:31,874 DEBUG org.apache.hadoop.hbase.client.HTable$ClientS= canner: Advancing internal scanner to startKey at 'null' 2009-06-21 20:22:07,895 WARN org.apache.hadoop.mapred.TaskTracker: Error ru= nning child org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to contact= region server null for region , row '', but failed after 10 attempts. Exceptions: java.lang.NullPointerException java.lang.NullPointerException java.lang.NullPointerException java.lang.NullPointerException java.lang.NullPointerException java.lang.NullPointerException java.lang.NullPointerException java.lang.NullPointerException java.lang.NullPointerException (elided, same as before) Looks like the problem is in TIF & scan client code, since if you give 'nul= l' as start key, it will fail in this ugly way. But there seems to be more problems in TIF, while this one failed after the= server threw a scanner issue, in the 'successful' version the logfile is a= lso problematic: 2009-06-21 20:18:31,637 DEBUG org.apache.hadoop.hbase.client.HTable$ClientS= canner: Creating scanner over table_name starting at key '' 2009-06-21 20:18:31,637 DEBUG org.apache.hadoop.hbase.client.HTable$ClientS= canner: Advancing internal scanner to startKey at '' 2009-06-21 20:18:31,637 DEBUG org.apache.hadoop.hbase.client.HConnectionMan= ager$TableServers: Cache hit for row <> in tableName table_name: location s= erver 10.20.20.155:60020, location region name table_name,,1245570713261 2009-06-21 20:18:31,647 INFO org.apache.hadoop.mapred.MapTask: numReduceTas= ks: 1 2009-06-21 20:18:31,665 INFO org.apache.hadoop.mapred.MapTask: io.sort.mb = =3D 100 2009-06-21 20:18:31,786 INFO org.apache.hadoop.mapred.MapTask: data buffer = =3D 79691776/99614720 2009-06-21 20:18:31,786 INFO org.apache.hadoop.mapred.MapTask: record buffe= r =3D 262144/327680 2009-06-21 20:19:06,664 DEBUG org.apache.hadoop.hbase.client.HTable$ClientS= canner: Advancing forward from region REGION =3D> {NAME =3D> 'table_name,,1= 245570713261', STARTKEY =3D> '', ENDKEY =3D> '\x00\x02\x13\x88\x00\x00\x01\= x05\xB6\xBFB\xA0\x00\x05B\xE0', ENCODED =3D> 497118635, TABLE =3D> {{NAME = =3D> 'table_name', MEMCACHE_FLUSHSIZE =3D> '67108864', MAX_FILESIZE =3D> '5= 36870912', READONLY =3D> 'false', FAMILIES =3D> [{NAME =3D> 'default', COMP= RESSION =3D> 'LZO', VERSIONS =3D> '3', TTL =3D> '2147483647', BLOCKSIZE =3D= > '65536', IN_MEMORY =3D> 'false', BLOCKCACHE =3D> 'true'}]}} 2009-06-21 20:19:06,665 DEBUG org.apache.hadoop.hbase.client.HTable$ClientS= canner: Advancing internal scanner to startKey at '^@^B^S=EF=BF=BD^@^@^A^E= =EF=BF=BD=EF=BF=BDB=EF=BF=BD^@^EB=EF=BF=BD' 2009-06-21 20:19:06,677 DEBUG org.apache.hadoop.hbase.client.HTable$ClientS= canner: Advancing forward from region REGION =3D> {NAME =3D> 'table_name,\x= 00\x02\x13\x88\x00\x00\x01\x05\xB6\xBFB\xA0\x00\x05B\xE0,1245570713261', ST= ARTKEY =3D> '\x00\x02\x13\x88\x00\x00\x01\x05\xB6\xBFB\xA0\x00\x05B\xE0', E= NDKEY =3D> '\x00\x03\xF6\x5C\x00\x00\x01\x05\xD5\x27d\xE8\x00\x09eQ', ENCOD= ED =3D> 1054973557, TABLE =3D> {{NAME =3D> 'table_name', MEMCACHE_FLUSHSIZE= =3D> '67108864', MAX_FILESIZE =3D> '536870912', READONLY =3D> 'false', FAM= ILIES =3D> [{NAME =3D> 'default', COMPRESSION =3D> 'LZO', VERSIONS =3D> '3'= , TTL =3D> '2147483647', BLOCKSIZE =3D> '65536', IN_MEMORY =3D> 'false', BL= OCKCACHE =3D> 'true'}]}} 2009-06-21 20:19:06,677 DEBUG org.apache.hadoop.hbase.client.HTable$ClientS= canner: Advancing internal scanner to startKey at '^@^C=EF=BF=BD\^@^@^A^E= =EF=BF=BD'd=EF=BF=BD^@ eQ' 2009-06-21 20:19:06,689 DEBUG org.apache.hadoop.hbase.client.HTable$ClientS= canner: Advancing forward from region REGION =3D> {NAME =3D> 'table_name,\x= 00\x03\xF6\x5C\x00\x00\x01\x05\xD5\x27d\xE8\x00\x09eQ,1245570960707', START= KEY =3D> '\x00\x03\xF6\x5C\x00\x00\x01\x05\xD5\x27d\xE8\x00\x09eQ', ENDKEY = =3D> '\x00\x05\xF0\x9F\x00\x00\x01\x03\x7D\xBB\x858\x00\x04\x7B\xDE', ENCOD= ED =3D> 171207314, TABLE =3D> {{NAME =3D> 'table_name', MEMCACHE_FLUSHSIZE = =3D> '67108864', MAX_FILESIZE =3D> '536870912', READONLY =3D> 'false', FAMI= LIES =3D> [{NAME =3D> 'default', COMPRESSION =3D> 'LZO', VERSIONS =3D> '3',= TTL =3D> '2147483647', BLOCKSIZE =3D> '65536', IN_MEMORY =3D> 'false', BLO= CKCACHE =3D> 'true'}]}} Even though the scanner starts at '' it seems to continue to more than 1 re= gion, which is not supposed to happen since # of mappers =3D # of regions. Other mappers indicate the same issue - going past the end of the region th= ey were assigned to. Very mysterious problem here! > TIF (and other clients?) cant seem to find one region (getClosestRowBefor= e issue?) > -------------------------------------------------------------------------= --------- > > Key: HBASE-1560 > URL: https://issues.apache.org/jira/browse/HBASE-1560 > Project: Hadoop HBase > Issue Type: Bug > Affects Versions: 0.20.0 > Reporter: ryan rawson > Priority: Blocker > Fix For: 0.20.0 > > > running a full TIF-mr on a table, it eventually fails, all on 1 of the sp= lits, and all with the same exception set, which is: > org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to conta= ct region server null for region , row '', but failed after 10 attempts. > Exceptions: > java.lang.NullPointerException > java.lang.NullPointerException > java.lang.NullPointerException > java.lang.NullPointerException > java.lang.NullPointerException > java.lang.NullPointerException > java.lang.NullPointerException > java.lang.NullPointerException > java.lang.NullPointerException > java.lang.NullPointerException > =09at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getR= egionServerWithRetries(HConnectionManager.java:935) > =09at org.apache.hadoop.hbase.client.HTable$ClientScanner.nextScanner(HTa= ble.java:1842) > =09at org.apache.hadoop.hbase.client.HTable$ClientScanner.initialize(HTab= le.java:1790) > =09at org.apache.hadoop.hbase.client.HTable.getScanner(HTable.java:369) > =09at org.apache.hadoop.hbase.mapred.TableInputFormatBase$TableRecordRead= er.restart(TableInputFormatBase.java:121) > =09at org.apache.hadoop.hbase.mapred.TableInputFormatBase$TableRecordRead= er.next(TableInputFormatBase.java:222) > =09at org.apache.hadoop.hbase.mapred.TableInputFormatBase$TableRecordRead= er.next(TableInputFormatBase.java:90) > =09at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(Map= Task.java:191) > =09at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.j= ava:175) > =09at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48) > =09at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:356) > =09at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) > =09at org.apache.hadoop.mapred.Child.main(Child.java:170) > Suspicion: We can't locate the 'root' region with key '' or null. Probab= ly an issue with getClosestRowBefore. --=20 This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.