hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-1609) [part of hbase-1583] We wait on leases to expire before regionserver goes down. Rather, just let client fail
Date Fri, 17 Jul 2009 04:27:14 GMT

    [ https://issues.apache.org/jira/browse/HBASE-1609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12732334#action_12732334
] 

stack commented on HBASE-1609:
------------------------------

Testing puts, I see the below in client when we shut down in middle of an upload:

{code}
2009-07-17 04:10:04,645 DEBUG org.apache.hadoop.hbase.client.HConnectionManager$TableServers:
Reloading region TestTable,\x00\x03\x02\x04\x05\x07\x09\x04\x00\x03,1247725178498 location
because regionserver didn't accept updates; tries=0 of max=10, waiting=2000ms
2009-07-17 04:10:06,904 DEBUG org.apache.hadoop.hbase.client.HConnectionManager$TableServers:
Reloading region TestTable,\x00\x03\x02\x04\x05\x07\x09\x04\x00\x03,1247725178498 location
because regionserver didn't accept updates; tries=1 of max=10, waiting=2000ms
2009-07-17 04:10:09,015 DEBUG org.apache.hadoop.hbase.client.HConnectionManager$TableServers:
Reloading region TestTable,\x00\x03\x02\x04\x05\x07\x09\x04\x00\x03,1247725178498 location
because regionserver didn't accept updates; tries=2 of max=10, waiting=2000ms
2009-07-17 04:10:11,068 DEBUG org.apache.hadoop.hbase.client.HConnectionManager$TableServers:
Reloading region TestTable,\x00\x03\x02\x04\x05\x07\x09\x04\x00\x03,1247725178498 location
because regionserver didn't accept updates; tries=3 of max=10, waiting=4000ms
2009-07-17 04:10:15,107 DEBUG org.apache.hadoop.hbase.client.HConnectionManager$TableServers:
Reloading region TestTable,\x00\x03\x02\x04\x05\x07\x09\x04\x00\x03,1247725178498 location
because regionserver didn't accept updates; tries=4 of max=10, waiting=4000ms
2009-07-17 04:10:19,216 DEBUG org.apache.hadoop.hbase.client.HConnectionManager$TableServers:
Reloading region TestTable,\x00\x03\x02\x04\x05\x07\x09\x04\x00\x03,1247725178498 location
because regionserver didn't accept updates; tries=5 of max=10, waiting=8000ms
2009-07-17 04:10:27,490 DEBUG org.apache.hadoop.hbase.client.HConnectionManager$TableServers:
Reloading region TestTable,\x00\x03\x02\x04\x05\x07\x09\x04\x00\x03,1247725178498 location
because regionserver didn't accept updates; tries=6 of max=10, waiting=8000ms
2009-07-17 04:10:35,534 DEBUG org.apache.hadoop.hbase.client.HConnectionManager$TableServers:
Reloading region TestTable,\x00\x03\x02\x04\x05\x07\x09\x04\x00\x03,1247725178498 location
because regionserver didn't accept updates; tries=7 of max=10, waiting=16000ms
2009-07-17 04:10:52,446 DEBUG org.apache.hadoop.hbase.client.HConnectionManager$TableServers:
Reloading region TestTable,\x00\x03\x02\x04\x05\x07\x09\x04\x00\x03,1247725178498 location
because regionserver didn't accept updates; tries=8 of max=10, waiting=32000ms
2009-07-17 04:11:24,514 WARN org.apache.hadoop.mapred.TaskTracker: Error running child
org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to contact region server
Some server, retryOnlyOne=true, index=0, islastrow=false, tries=9, numtries=10, i=0, listsize=8643,
location=address: X.X.X.141:60020, regioninfo: REGION => {NAME => 'TestTable,\x00\x03\x02\x04\x05\x07\x09\x04\x00\x03,1247725178498',
STARTKEY => '\x00\x03\x02\x04\x05\x07\x09\x04\x00\x03', ENDKEY => '\x00\x03\x02\x04\x06\x07\x07\x00\x06\x06',
ENCODED => 1615573, TABLE => {{NAME => 'TestTable', FAMILIES => [{NAME => 'info',
COMPRESSION => 'NONE', VERSIONS => '3', TTL => '2147483647', BLOCKSIZE => '65536',
IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}}, region=TestTable,\x00\x03\x02\x04\x05\x07\x09\x04\x00\x03,1247725178498
for region TestTable,\x00\x03\x02\x04\x05\x07\x09\x04\x00\x03,1247725178498, row '\x00\x03\x02\x04\x06\x03\x09\x01\x02\x06',
but failed after 10 attempts.
{code}

I think I see connection refused too.

That ain't bad I'd say.

This is with zk not managed by hbase.  If I shut down a cluster where hbase is managing the
zk quorum -- i.e. its shutdown as part of hbase shutdown -- then I see client log filled with
zk complaints with above intermixed.

Scanning, I see EOFException because server went down returning result it looks like.

Exceptions ain't pretty but I don't see anything inherently wrong.  Will go ahead and commit.

With this new commit,  our new philosophy is no more trying to be mr. nice guy regards clients
if admin wants cluster to go down.

> [part of hbase-1583] We wait on leases to expire before regionserver goes down.  Rather,
just let client fail
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-1609
>                 URL: https://issues.apache.org/jira/browse/HBASE-1609
>             Project: Hadoop HBase
>          Issue Type: Improvement
>            Reporter: stack
>            Assignee: stack
>             Fix For: 0.20.0
>
>         Attachments: 1609-v2.patch, 1609.patch
>
>
> Addressing this issue will help hbase-1583.  We should do for 0.20.0 and perhaps for
0.19.x even.
> Currently, if outstanding leases, in HRegion close, we'll hang until lease expires. 
Could be a minute.  Could be worse, the client might come in and renew the lease a few times
at least till it finishes out the region.  This gets in way of regionserver shutting down
fast.  
> J-D suggests that regionserver should just go down and outstanding clients should fail
rather than try and be nice to outstanding clients (in his case, his MR job had failed so
no clients... but we insist on lease expiring).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message