hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "gaojinchao (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-3892) Table can't disable
Date Thu, 02 Jun 2011 11:48:47 GMT

    [ https://issues.apache.org/jira/browse/HBASE-3892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13042715#comment-13042715
] 

gaojinchao commented on HBASE-3892:
-----------------------------------

I am not familiar with unit test(It seems diffult to send a double report of a split and test
cluster function). So
I verified it by modified region server code.

Logs:
2011-06-02 19:57:49,056 INFO org.apache.hadoop.hbase.master.ServerManager: Received REGION_SPLIT:
ufdr2,8613816508215#1308,1307014802589.c5a8a47d9c84f417b9fcc4c8019e7c7e.: Daughters; ufdr2,8613816508215#1308,1307015867020.37481173e31ea469bcaa310cf8d7d980.,
ufdr2,8613816595415#3432,1307015867020.afbf02ef235cabe66026f7c393d79bc0. from C4C4.site,60020,1307015130114
2011-06-02 19:57:49,057 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: master:60000-0x30502c278f0000
Unable to get data of znode /hbase/unassigned/c5a8a47d9c84f417b9fcc4c8019e7c7e because node
does not exist (not necessarily an error)
2011-06-02 19:57:49,081 INFO org.apache.hadoop.hbase.master.ServerManager: Received REGION_SPLIT:
ufdr2,8613816508215#1308,1307014802589.c5a8a47d9c84f417b9fcc4c8019e7c7e.: Daughters; ufdr2,8613816508215#1308,1307015867020.37481173e31ea469bcaa310cf8d7d980.,
ufdr2,8613816595415#3432,1307015867020.afbf02ef235cabe66026f7c393d79bc0. from C4C4.site,60020,1307015130114
2011-06-02 19:57:49,083 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: master:60000-0x30502c278f0000
Unable to get data of znode /hbase/unassigned/c5a8a47d9c84f417b9fcc4c8019e7c7e because node
does not exist (not necessarily an error)
2011-06-02 19:57:49,083 WARN org.apache.hadoop.hbase.master.AssignmentManager: Trying to process
the split of 37481173e31ea469bcaa310cf8d7d980, but it was already done and one daughter is
on region server serverName=C4C4.site,60020,1307015130114, load=(requests=0, regions=0, usedHeap=0,
maxHeap=0)
2011-06-02 19:57:56,468 INFO org.apache.hadoop.hbase.master.ServerManager: Received REGION_SPLIT:
ufdr2,8613819021840#1446,1307014756068.f865d41d918297f30b576b9ea3ccea07.: Daughters; ufdr2,8613819021840#1446,1307015873554.baa21e4f0cfa5840f009d0fac8e83d15.,
ufdr2,8613819104397#3916,1307015873554.fb63f608e5e37f5e85d71c925bc78010. from C4C3.site,60020,1307015129703
2011-06-02 19:57:56,470 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: master:60000-0x30502c278f0000
Unable to get data of znode /hbase/unassigned/f865d41d918297f30b576b9ea3ccea07 because node
does not exist (not necessarily an error)
2011-06-02 19:57:56,472 INFO org.apache.hadoop.hbase.master.ServerManager: Received REGION_SPLIT:
ufdr2,8613819021840#1446,1307014756068.f865d41d918297f30b576b9ea3ccea07.: Daughters; ufdr2,8613819021840#1446,1307015873554.baa21e4f0cfa5840f009d0fac8e83d15.,
ufdr2,8613819104397#3916,1307015873554.fb63f608e5e37f5e85d71c925bc78010. from C4C3.site,60020,1307015129703
2011-06-02 19:57:56,474 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: master:60000-0x30502c278f0000
Unable to get data of znode /hbase/unassigned/f865d41d918297f30b576b9ea3ccea07 because node
does not exist (not necessarily an error)
2011-06-02 19:57:56,474 WARN org.apache.hadoop.hbase.master.AssignmentManager: Trying to process
the split of baa21e4f0cfa5840f009d0fac8e83d15, but it was already done and one daughter is
on region server serverName=C4C3.site,60020,1307015129703, load=(requests=0, regions=0, usedHeap=0,
maxHeap=0)


Thanks for your hint. It should be a 60 seconds timeout. Region server repeated message about
60s.

2011-05-08 17:43:45,507 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Connected
to master at C4C1.site:60000
2011-05-08 17:44:45,521 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Connected
to master at C4C1.site:60000

It seems a race with regionsInTransition. So IPC was blocked. I try to reproduce it.

Hmaster logs:  It recieved so much message RS_ZK_REGION_CLOSED. 
2011-05-08 17:43:45,157 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: master:60000-0x22fcd582836003d
Retrieved 125 byte(s) of data from znode /hbase/unassigned/83c05d9ead23d9a260edf30dc8739cf7
and set watcher; region=ufdr,2011050802#8613815394007#0610,1304847545412.83c05d9ead23d9a260edf30dc8739cf7.,
server=C4C4.site,60020,1304820199467, state=RS_ZK_REGION_CLOSING
2011-05-08 17:43:45,525 INFO org.apache.hadoop.hbase.master.ServerManager: Received REGION_SPLIT:
ufdr,2011050812#8613817306227#0516,1304845660567.8e9a3b05abe1c3a692999cf5e8dfd9dd.: Daughters;
ufdr,2011050812#8613817306227#0516,1304847764729.5e4bca85c33fa6605ffc9a5c2eb94e62., ufdr,2011050812#8613817398167#4032,1304847764729.4418fb197685a21f77e151e401cf8b66.
from C4C4.site,60020,1304820199467
2011-05-08 17:43:48,943 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: master:60000-0x22fcd582836003d
Retrieved 125 byte(s) of data from znode /hbase/unassigned/5e3bacf3f43b6bad874e80c2f971e632
and set watcher; region=ufdr,2011050803#8613819042855#3419,1304844562298.5e3bacf3f43b6bad874e80c2f971e632.,
server=C4C5.site,60020,1304820199469, state=RS_ZK_REGION_CLOSED
2011-05-08 17:43:50,777 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: master:60000-0x22fcd582836003d
Retrieved 125 byte(s) of data from znode /hbase/unassigned/aaddc1ecd4ab768357b3a44df6828e23
and set watcher; region=ufdr,2011050810#8613815306434#1853,1304823564447.aaddc1ecd4ab768357b3a44df6828e23.,
server=C4C5.site,60020,1304820199469, state=RS_ZK_REGION_CLOSED
2011-05-08 17:43:53,219 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: master:60000-0x22fcd582836003d
Retrieved 125 byte(s) of data from znode /hbase/unassigned/6bf4788ea7349a62967b6781b37cf157
and set watcher; region=ufdr,2011050802#8613816510664#5948,1304842169086.6bf4788ea7349a62967b6781b37cf157.,
server=C4C4.site,60020,1304820199467, state=RS_ZK_REGION_CLOSED
2011-05-08 17:43:55,812 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: master:60000-0x22fcd582836003d
Retrieved 125 byte(s) of data from znode /hbase/unassigned/3decca49c2dbc3f8a9d49f192b747b12
and set watcher; region=ufdr,2011050801#8613819767209#4733,1304837619424.3decca49c2dbc3f8a9d49f192b747b12.,
server=C4C3.site,60020,1304820199474, state=RS_ZK_REGION_CLOSED
2011-05-08 17:43:56,722 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: master:60000-0x22fcd582836003d
Retrieved 125 byte(s) of data from znode /hbase/unassigned/49cec92a2e7dd6d8ef38aab6128b29f8
and set watcher; region=ufdr,2011050802#8613815474662#2305,1304847545412.49cec92a2e7dd6d8ef38aab6128b29f8.,
server=C4C4.site,60020,1304820199467, state=RS_ZK_REGION_CLOSED
2011-05-08 17:44:01,382 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: master:60000-0x22fcd582836003d
Retrieved 125 byte(s) of data from znode /hbase/unassigned/fcaaff18b3afa53e2bae3c9eb7b3aca2
and set watcher; region=ufdr,2011050803#8613819212381#4006,1304845030265.fcaaff18b3afa53e2bae3c9eb7b3aca2.,
server=C4C5.site,60020,1304820199469, state=RS_ZK_REGION_CLOSED
2011-05-08 17:44:01,383 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: master:60000-0x22fcd582836003d
Retrieved 125 byte(s) of data from znode /hbase/unassigned/6fa30b1a2352e36e0b5c9272b61e17cf
and set watcher; region=ufdr,2011050801#8613818694123#3028,1304847154793.6fa30b1a2352e36e0b5c9272b61e17cf.,
server=C4C4.site,60020,1304820199467, state=RS_ZK_REGION_CLOSED





> Table can't disable
> -------------------
>
>                 Key: HBASE-3892
>                 URL: https://issues.apache.org/jira/browse/HBASE-3892
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.3
>            Reporter: gaojinchao
>             Fix For: 0.90.4
>
>         Attachments: AssignmentManager_90.patch, AssignmentManager_90v2.patch, AssignmentManager_90v3.patch,
logs.rar
>
>
> In TimeoutMonitor : 
> if node exists and node state is RS_ZK_REGION_CLOSED
> We should send a zk message again when close region is timeout.
> in this case, It may be loss some message.
> I See. It seems like a bug. This is my analysis.
> // disable table and master sent Close message to region server, Region state was set
PENDING_CLOSE
> 2011-05-08 17:44:25,745 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Sent
CLOSE to serverName=C4C4.site,60020,1304820199467, load=(requests=0, regions=123, usedHeap=4097,
maxHeap=8175) for region ufdr,2011050812#8613817398167#4032,1304847764729.4418fb197685a21f77e151e401cf8b66.
> 2011-05-08 17:44:45,530 INFO org.apache.hadoop.hbase.master.ServerManager: Received REGION_SPLIT:
ufdr,2011050812#8613817306227#0516,1304845660567.8e9a3b05abe1c3a692999cf5e8dfd9dd.: Daughters;
ufdr,2011050812#8613817306227#0516,1304847764729.5e4bca85c33fa6605ffc9a5c2eb94e62., ufdr,2011050812#8613817398167#4032,1304847764729.4418fb197685a21f77e151e401cf8b66.
from C4C4.site,60020,1304820199467
> 2011-05-08 17:45:45,542 INFO org.apache.hadoop.hbase.master.ServerManager: Received REGION_SPLIT:
ufdr,2011050812#8613817306227#0516,1304845660567.8e9a3b05abe1c3a692999cf5e8dfd9dd.: Daughters;
ufdr,2011050812#8613817306227#0516,1304847764729.5e4bca85c33fa6605ffc9a5c2eb94e62., ufdr,2011050812#8613817398167#4032,1304847764729.4418fb197685a21f77e151e401cf8b66.
from C4C4.site,60020,1304820199467
> // received splitting message and cleared Region state (PENDING_CLOSE)
> 2011-05-08 17:46:45,303 WARN org.apache.hadoop.hbase.master.AssignmentManager: Overwriting
4418fb197685a21f77e151e401cf8b66 on serverName=C4C4.site,60020,1304820199467, load=(requests=0,
regions=123, usedHeap=4097, maxHeap=8175)
> 2011-05-08 17:46:45,538 INFO org.apache.hadoop.hbase.master.ServerManager: Received REGION_SPLIT:
ufdr,2011050812#8613817306227#0516,1304845660567.8e9a3b05abe1c3a692999cf5e8dfd9dd.: Daughters;
ufdr,2011050812#8613817306227#0516,1304847764729.5e4bca85c33fa6605ffc9a5c2eb94e62., ufdr,2011050812#8613817398167#4032,1304847764729.4418fb197685a21f77e151e401cf8b66.
from C4C4.site,60020,1304820199467
> 2011-05-08 17:47:45,548 INFO org.apache.hadoop.hbase.master.ServerManager: Received REGION_SPLIT:
ufdr,2011050812#8613817306227#0516,1304845660567.8e9a3b05abe1c3a692999cf5e8dfd9dd.: Daughters;
ufdr,2011050812#8613817306227#0516,1304847764729.5e4bca85c33fa6605ffc9a5c2eb94e62., ufdr,2011050812#8613817398167#4032,1304847764729.4418fb197685a21f77e151e401cf8b66.
from C4C4.site,60020,1304820199467
> 2011-05-08 17:48:45,545 INFO org.apache.hadoop.hbase.master.ServerManager: Received REGION_SPLIT:
ufdr,2011050812#8613817306227#0516,1304845660567.8e9a3b05abe1c3a692999cf5e8dfd9dd.: Daughters;
ufdr,2011050812#8613817306227#0516,1304847764729.5e4bca85c33fa6605ffc9a5c2eb94e62., ufdr,2011050812#8613817398167#4032,1304847764729.4418fb197685a21f77e151e401cf8b66.
from C4C4.site,60020,1304820199467
> 2011-05-08 17:49:46,108 INFO org.apache.hadoop.hbase.master.ServerManager: Received REGION_SPLIT:
ufdr,2011050812#8613817306227#0516,1304845660567.8e9a3b05abe1c3a692999cf5e8dfd9dd.: Daughters;
ufdr,2011050812#8613817306227#0516,1304847764729.5e4bca85c33fa6605ffc9a5c2eb94e62., ufdr,2011050812#8613817398167#4032,1304847764729.4418fb197685a21f77e151e401cf8b66.
from C4C4.site,60020,1304820199467
> 2011-05-08 17:50:46,105 INFO org.apache.hadoop.hbase.master.ServerManager: Received REGION_SPLIT:
ufdr,2011050812#8613817306227#0516,1304845660567.8e9a3b05abe1c3a692999cf5e8dfd9dd.: Daughters;
ufdr,2011050812#8613817306227#0516,1304847764729.5e4bca85c33fa6605ffc9a5c2eb94e62., ufdr,2011050812#8613817398167#4032,1304847764729.4418fb197685a21f77e151e401cf8b66.
from C4C4.site,60020,1304820199467
> 2011-05-08 17:51:46,117 INFO org.apache.hadoop.hbase.master.ServerManager: Received REGION_SPLIT:
ufdr,2011050812#8613817306227#0516,1304845660567.8e9a3b05abe1c3a692999cf5e8dfd9dd.: Daughters;
ufdr,2011050812#8613817306227#0516,1304847764729.5e4bca85c33fa6605ffc9a5c2eb94e62., ufdr,2011050812#8613817398167#4032,1304847764729.4418fb197685a21f77e151e401cf8b66.
from C4C4.site,60020,1304820199467
> 2011-05-08 17:52:46,112 INFO org.apache.hadoop.hbase.master.ServerManager: Received REGION_SPLIT:
ufdr,2011050812#8613817306227#0516,1304845660567.8e9a3b05abe1c3a692999cf5e8dfd9dd.: Daughters;
ufdr,2011050812#8613817306227#0516,1304847764729.5e4bca85c33fa6605ffc9a5c2eb94e62., ufdr,2011050812#8613817398167#4032,1304847764729.4418fb197685a21f77e151e401cf8b66.
from C4C4.site,60020,1304820199467
> 2011-05-08 17:52:47,309 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: master:60000-0x22fcd582836003d
Retrieved 125 byte(s) of data from znode /hbase/unassigned/4418fb197685a21f77e151e401cf8b66
and set watcher; region=ufdr,2011050812#8613817398167#4032,1304847764729.4418fb197685a21f77e151e401cf8b66.,
server=C4C4.site,60020,1304820199467, state=RS_ZK_REGION_CLOSED
> 2011-05-08 17:52:47,388 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling
new unassigned node: /hbase/unassigned/4418fb197685a21f77e151e401cf8b66 (region=ufdr,2011050812#8613817398167#4032,1304847764729.4418fb197685a21f77e151e401cf8b66.,
server=C4C4.site,60020,1304820199467, state=RS_ZK_REGION_CLOSED)
> 2011-05-08 17:52:47,388 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling
transition=RS_ZK_REGION_CLOSED, server=C4C4.site,60020,1304820199467, region=4418fb197685a21f77e151e401cf8b66
> // region server had closed region, but the region state had cleared. So it printed warning
log.
> 2011-05-08 17:52:47,388 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received
CLOSED for region 4418fb197685a21f77e151e401cf8b66 from server C4C4.site,60020,1304820199467
but region was in  the state null and not in expected PENDING_CLOSE or CLOSING states
> 2011-05-08 17:52:47,397 WARN org.apache.hadoop.hbase.master.AssignmentManager: Overwriting
4418fb197685a21f77e151e401cf8b66 on serverName=C4C4.site,60020,1304820199467, load=(requests=0,
regions=123, usedHeap=4097, maxHeap=8175)
> // The region state was set PENDING_CLOSE again.  the table couldn't disable and enable.
> 2011-05-08 17:52:47,398 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Starting
unassignment of region ufdr,2011050812#8613817398167#4032,1304847764729.4418fb197685a21f77e151e401cf8b66.
(offlining)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message