hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stack <st...@duboce.net>
Subject Re: Lease does not exist exceptions
Date Tue, 18 Oct 2011 11:06:05 GMT
Look back in the mailing list Eran for more detailed answers but in
essence, the below usually means that the client has been away from
the server too long.  This can happen for a few reasons.  If you fetch
lots of rows per next on a scanner, processing the batch client side
may be taking you longer than the lease timeout.  Set down the
prefetch size and see if that helps (I'm talking about this:
http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html#setBatch(int)).
 Throw in a GC on client-side or over on the server-side and it might
put you over your lease timeout.  Are your mapreduce jobs heavy-duty
robbing resources from the running regionservers or datanodes?  Try
having them run half the mappers and see if that makes it more likely
your job will complete.

St.Ack
P.S IIRC, J-D tripped over a cause recently but I can't find it at the mo.

On Tue, Oct 18, 2011 at 10:28 AM, Eran Kutner <eran@gigya.com> wrote:
> Hi,
> I'm having a problem when running map/reduce on a table with about 500
> regions.
> The MR job shows this kind of excpetions:
> 11/10/18 06:03:39 INFO mapred.JobClient: Task Id :
> attempt_201110030100_0086_m_000062_0, Status : FAILED
> org.apache.hadoop.hbase.regionserver.LeaseException:
> org.apache.hadoop.hbase.regionserver.LeaseException: lease
> '-334679770697295011' does not exist
>        at
> org.apache.hadoop.hbase.regionserver.Leases.removeLease(Leases.java:230)
>        at
> org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:1845)
>        at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
>        at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at
> org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570)
>        at
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039)
>
>        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
> Method)
>        at
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
>        at
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
>        at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
>        at
> org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(RemoteExceptionHandler.java:96)
>        at
> org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:83)
>        at
> org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:1)
>        at
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithRetries(HConnectionManager.java:1019)
>        at
> org.apache.hadoop.hbase.client.HTable$ClientScanner.next(HTable.java:1151)
>        at
> org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl.nextKeyValue(TableRecordReaderImpl.java:149)
>        at
> org.apache.hadoop.hbase.mapreduce.TableRecordReader.nextKeyValue(TableRecordReader.java:142)
>        at
> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:456)
>        at
> org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
>        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
>        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:647)
>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323)
>        at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
>        at java.security.AccessController.doPrivileged(Native Method)
>        at javax.security.auth.Subject.doAs(Subject.java:396)
>        at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
>        at org.apache.hadoop.mapred.Child.main(Child.java:264)
>
> the hbase logs are full of these:
> 2011-10-18 06:07:01,425 ERROR
> org.apache.hadoop.hbase.regionserver.HRegionServer:
> org.apache.hadoop.hbase.regionserver.LeaseException: lease
> '3475143032285946374' does not exist
>        at
> org.apache.hadoop.hbase.regionserver.Leases.removeLease(Leases.java:230)
>        at
> org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:1845)
>        at sun.reflect.GeneratedMethodAccessor22.invoke(Unknown Source)
>        at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at
> org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570)
>        at
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039)
>
>
> and the datanodes logs have a few (seem to be a lot less than the hbase
> errors) of these:
> 2011-10-18 06:16:42,550 ERROR
> org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
> 10.1.104.4:50010, storageID=DS-15546166-10.1.104.4-50010-1298985607414,
> infoPort=50075, ipcPort=50020):DataXceiver
> java.net.SocketTimeoutException: 480000 millis timeout while waiting for
> channel to be ready for write. ch :
> java.nio.channels.SocketChannel[connected local=/10.1.104.4:50010 remote=/
> 10.1.104.1:57232]
>        at
> org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246)
>        at
> org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159)
>        at
> org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198)
>        at
> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:350)
>        at
> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:436)
>        at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:214)
>        at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:114)
>
> I've increased all the relevant limits I know of (which were high to begin
> with), so now I have 64K file descriptors and dfs.datanode.max.xcievers is
> 8192 .
> I've restarted everything in the cluster, to make sure all the processed
> picked the new configurations, but I still get those errors. They always
> begin when the map phase is around 12-14% and eventually the job fails at
> ~50%
> Running random scans against the same  hbase table while the job is running
> seems to work fine.
>
> I'm using hadoop 0.20.2+923.97-1 from CDH3 and hbase 0.90.4 compiled from
> the branch code a while ago.
>
> Any other setting I'm missing or other ideas of what can be causing it?
>
> Thanks.
>
> -eran
>

Mime
View raw message