hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vincent Barat <vba...@ubikod.com>
Subject NotServingRegionException exception after the loss of a regionserver
Date Fri, 17 Jun 2011 13:42:01 GMT

This morning, on our production system, we experienced a very bad behavior of HBase 0.20.6.

1- one of our region server crash
2- we restarted it with success (no error on the master nor on the region servers)
3- but we discovered that our HBase clients were enable to recover for this situation:

Each time a get() was performed, but ONLY ON THE BIGGEST TABLES, our HBase clients triggered
an exception (actually coming fro the restarted region server):
org.apache.hadoop.hbase.NotServingRegionException: org.apache.hadoop.hbase.NotServingRegionException:
         at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2269)
         at org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:1732)
         at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
         at java.lang.reflect.Method.invoke(Method.java:597)
         at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:657)
         at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915)

More strange:

3- only client READING HBase triggered this exception: client writing to HBase recovered without
any error from this failure (and the writes were effectively performed)

To fix this, we had to restart all our HBase clients reading from the BIGGEST TABLES. So we
guess that the issue come from the HBase client library or the region server itself.

We reproduce this bug easily on our development servers: we kill a region server, we restart
it and clients trying to "get" from regions served by the killed/restarted region server get
this exception until we restart them.

So my questions are:

Is this a know issue ?
Has it been fixed in HBase 0.90 ?
Is it required to handle this exception in a special way on client side (e.g. close / reopen
the table) ?

Thank a lot

View raw message