hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ameya Kanitkar <am...@groupon.com>
Subject Lease Exception Errors When Running Heavy Map Reduce Job
Date Wed, 28 Aug 2013 15:00:55 GMT
HI All,

We have a very heavy map reduce job that goes over entire table with over
1TB+ data in HBase and exports all data (Similar to Export job but with
some additional custom code built in) to HDFS.

However this job is not very stable, and often times we get following error
and job fails:

org.apache.hadoop.hbase.regionserver.LeaseException:
org.apache.hadoop.hbase.regionserver.LeaseException: lease
'-4456594242606811626' does not exist
	at org.apache.hadoop.hbase.regionserver.Leases.removeLease(Leases.java:231)
	at org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:2429)
	at sun.reflect.GeneratedMethodAccessor42.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364)
	at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1400)

	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
	at java.lang.reflect.Constructor.newInstance(Constructor.


Here are more detailed logs on the RS: http://pastebin.com/xaHF4ksb

We have changed following settings in HBase to counter this problem
but issue persists:

<property>
<!-- Loaded from hbase-site.xml -->
<name>hbase.regionserver.lease.period</name>
<value>900000</value>
</property>

<property>
<!-- Loaded from hbase-site.xml -->
<name>hbase.rpc.timeout</name>
<value>900000</value>
</property>


We also reduced number of mappers per RS less than available CPU's on the box.

We also observed that problem once happens, happens multiple times on
the same RS. All other regions are unaffected. But different RS
observes this problem on different days. There is no particular region
causing this either.

We are running: 0.94.2 with cdh4.2.0

Any ideas?


Ameya

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message