hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Gray <jg...@facebook.com>
Subject RE: bug report: opening hbase region takes too long , making the region not available for more than 10 minutes.
Date Mon, 14 Jun 2010 23:25:08 GMT
Can you post the log from the regionserver that did not ever open the region (from 12:57 to
13:14)?  And actually grab it from a couple minutes before 12:57.

Most likely this is not a bug as much as a current limitation of handling open/close messages
sequentially.  It's possible that a long-running close (flush) held up processing of the open.
 The logs will say more.

This should be much improved with the major release of HBase.

JG

> -----Original Message-----
> From: Jinsong Hu [mailto:jinsong_hu@hotmail.com]
> Sent: Monday, June 14, 2010 11:24 AM
> To: user@hbase.apache.org
> Subject: bug report: opening hbase region takes too long , making the
> region not available for more than 10 minutes.
> 
> 
> 
> Hi, There:
> 
>    I have found an hbase bug related to openning region takes too long.
> The
> client reported error of no server address.  For the region
> MyOwnEventTable,2010-06-13
> 10:33:31\x0922f3563bd43a3c3c044bd1db885f1523,1276457581773,  here is
> the
> sequence:
> 
> 
> 
> Around  12:57, all 8 region servers closed this region.
> On machine2037,  at 12:57:45,812 , it received a request to open this
> region.  Usually, a worker thread will immediately honor the request
> and
> open this region within seconds, but in this case, the region wasn't
> open
> until 13:14:43,341 .
> Around 13:16, all other regionservers received requests to open this
> region
> , and worker thread immediately opened them .
> 
> 
> So during this time time gap from 12:57 to 13:14, the region is not
> available. And the client logs error while trying to insert the
> records.
> 
> 
> 
> I have read the hbase code. The way the hbase solves this problem is by
> retrying 10 times, waiting 10 seconds in between. Essentially it tries
> for
> 100 seconds.
> 
> In this case, even that 100 seconds retrial won't work at 12:10.
> because the
> region was opened way beyond 100 second interval.
> 
> 
> 
> This is clearly an hbase bug.
> 
> 
> Jimmy>
> 
> 
> 
> 
> Here is the client side log:
> 
> 13:10:03,441 INFO  [ClientCnxn] Attempting connection to server
> zookeeper2.cloud.mydomain.net/10.110.8 52:2181: No server address
> listed in
> .META. for region MyOwnEventTable,2010-06-13
> 10:33:31\x0922f3563bd43a3c3c044bd1db885f1523,1276457581773
> 
> 
> 
> 13:10:03,451 INFO  [ClientCnxn] Server connection successful
> 
> org.apache.hadoop.hbase.client.NoServerForRegionException: No server
> address
> listed in .META. for r gion MyOwnEventTable,2010-06-13
> 10:33:31\x0922f3563bd43a3c3c044bd1db885f1523,1276457581773
> 
> 
> 
> 
> 
> here are the regionserver side log related to this issue.
> 
> 
> machine2035:
> 
> 2010-06-14 12:57:23,452 INFO
> org.apache.hadoop.hbase.regionserver.HRegion:
> Close
> 
> d MyOwnEventTable,2010-06-13
> 10:33:31\x0922f3563bd43a3c3c044bd1db885f1523,127
> 
> 6457581773
> 
> 2010-06-14 13:16:37,333 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer:
> 
>  MSG_REGION_OPEN: MyOwnEventTable,2010-06-13
> 10:33:31\x0922f3563bd43a3c3c044b
> 
> d1db885f1523,1276457581773
> 
> 2010-06-14 13:16:37,333 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer:
> 
>  Worker: MSG_REGION_OPEN: MyOwnEventTable,2010-06-13
> 10:33:31\x0922f3563bd43a
> 
> 3c3c044bd1db885f1523,1276457581773
> 
> 
> 
> 
> 
> machine2036:
> 
> 2010-06-14 12:57:29,312 INFO
> org.apache.hadoop.hbase.regionserver.HRegion:
> Close
> 
> d MyOwnEventTable,2010-06-13
> 10:33:31\x0922f3563bd43a3c3c044bd1db885f1523,127
> 
> 6457581773
> 
> 2010-06-14 13:16:05,107 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer:
> 
>  MSG_REGION_OPEN: MyOwnEventTable,2010-06-13
> 10:33:31\x0922f3563bd43a3c3c044b
> 
> d1db885f1523,1276457581773
> 
> 2010-06-14 13:16:05,107 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer:
> 
>  Worker: MSG_REGION_OPEN: MyOwnEventTable,2010-06-13
> 10:33:31\x0922f3563bd43a
> 
> 3c3c044bd1db885f1523,1276457581773
> 
> 
> 
> 
> 
> 
> 
> machine2037
> 
> 2010-06-14 12:57:09,986 INFO
> org.apache.hadoop.hbase.regionserver.HRegion:
> Close
> 
> d MyOwnEventTable,2010-06-13
> 10:33:31\x0922f3563bd43a3c3c044bd1db885f1523,127
> 
> 6457581773
> 
> 2010-06-14 12:57:45,812 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer:
> 
>  MSG_REGION_OPEN: MyOwnEventTable,2010-06-13
> 10:33:31\x0922f3563bd43a3c3c044b
> 
> d1db885f1523,1276457581773
> 
> 2010-06-14 13:14:43,341 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer:
> 
>  Worker: MSG_REGION_OPEN: MyOwnEventTable,2010-06-13
> 10:33:31\x0922f3563bd43a
> 
> 3c3c044bd1db885f1523,1276457581773
> 
> 
> 
> 
> 
> 
> 
> machine2038
> 
> 
> 
> 2010-06-14 12:57:25,562 INFO
> org.apache.hadoop.hbase.regionserver.HRegion:
> Close
> 
> d MyOwnEventTable,2010-06-13
> 10:33:31\x0922f3563bd43a3c3c044bd1db885f1523,127
> 
> 6457581773
> 
> 2010-06-14 13:15:53,356 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer:
> 
>  MSG_REGION_OPEN: MyOwnEventTable,2010-06-13
> 10:33:31\x0922f3563bd43a3c3c044b
> 
> d1db885f1523,1276457581773
> 
> 2010-06-14 13:15:53,356 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer:
> 
>  Worker: MSG_REGION_OPEN: MyOwnEventTable,2010-06-13
> 10:33:31\x0922f3563bd43a
> 
> 3c3c044bd1db885f1523,1276457581773
> 
> 
> 
> 
> 
> machine2040:
> 
> 2010-06-14 12:57:14,214 INFO
> org.apache.hadoop.hbase.regionserver.HRegion:
> Close
> 
> d MyOwnEventTable,2010-06-13
> 10:33:31\x0922f3563bd43a3c3c044bd1db885f1523,127
> 
> 6457581773
> 
> 2010-06-14 13:15:01,266 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer:
> 
>  MSG_REGION_OPEN: MyOwnEventTable,2010-06-13
> 10:33:31\x0922f3563bd43a3c3c044b
> 
> d1db885f1523,1276457581773
> 
> 2010-06-14 13:15:01,266 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer:
> 
>  Worker: MSG_REGION_OPEN: MyOwnEventTable,2010-06-13
> 10:33:31\x0922f3563bd43a
> 
> 3c3c044bd1db885f1523,1276457581773
> 
> 
> 
> 
> 
> 
> 
> machine2041
> 
> 2010-06-14 12:57:44,877 INFO
> org.apache.hadoop.hbase.regionserver.HRegion:
> Close
> 
> d MyOwnEventTable,2010-06-13
> 10:33:31\x0922f3563bd43a3c3c044bd1db885f1523,127
> 
> 6457581773
> 
> 2010-06-14 13:15:48,955 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer:
> 
>  MSG_REGION_OPEN: MyOwnEventTable,2010-06-13
> 10:33:31\x0922f3563bd43a3c3c044b
> 
> d1db885f1523,1276457581773
> 
> 2010-06-14 13:15:48,955 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer:
> 
>  Worker: MSG_REGION_OPEN: MyOwnEventTable,2010-06-13
> 10:33:31\x0922f3563bd43a
> 
> 3c3c044bd1db885f1523,1276457581773
> 
> 
> 
> machine2042:
> 
> 2010-06-14 12:57:12,500 INFO
> org.apache.hadoop.hbase.regionserver.HRegion:
> Close
> 
> d MyOwnEventTable,2010-06-13
> 10:33:31\x0922f3563bd43a3c3c044bd1db885f1523,127
> 
> 6457581773
> 
> 2010-06-14 13:14:58,719 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer:
> 
>  MSG_REGION_OPEN: MyOwnEventTable,2010-06-13
> 10:33:31\x0922f3563bd43a3c3c044b
> 
> d1db885f1523,1276457581773
> 
> 2010-06-14 13:14:58,719 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer:
> 
>  Worker: MSG_REGION_OPEN: MyOwnEventTable,2010-06-13
> 10:33:31\x0922f3563bd43a
> 
> 3c3c044bd1db885f1523,1276457581773
> 
> 
> 
> 
> 


Mime
View raw message