hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] Reopened: (HBASE-3368) Split message can come in before region opened message; results in 'Region has been PENDING_CLOSE for too long' cycle
Date Thu, 03 Feb 2011 07:32:29 GMT

     [ https://issues.apache.org/jira/browse/HBASE-3368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

stack reopened HBASE-3368:
--------------------------


Reopening.  Here is full transcript for problem region, a region now stuck in RIT:

{code}
2011-02-03 06:41:29,466 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x12dea40e4560000
Async create of unassigned node for da7cdcde7f10405c294144a35709f90f with OFFLINE state
2011-02-03 06:41:29,601 DEBUG org.apache.hadoop.hbase.master.AssignmentManager$CreateUnassignedAsyncCallback:
rs=TestTable,0073456533,1296708597440.da7cdcde7f10405c294144a35709f90f. state=OFFLINE, ts=1296715289466,
server=sv2borg181,60020,1296715280082
2011-02-03 06:41:29,826 DEBUG org.apache.hadoop.hbase.master.AssignmentManager$ExistsUnassignedAsyncCallback:
rs=TestTable,0073456533,1296708597440.da7cdcde7f10405c294144a35709f90f. state=OFFLINE, ts=1296715289466
2011-02-03 06:41:41,231 INFO org.apache.hadoop.hbase.master.ServerManager: Received REGION_SPLIT:
TestTable,0073456533,1296708597440.da7cdcde7f10405c294144a35709f90f.: Daughters; TestTable,0073456533,1296715299597.51d04ac711214779d429326cfc46f4df.,
TestTable,0073464447,1296715299597.f15a72e22cc041653142e5cf2e788ee8. from sv2borg181,60020,1296715280082
2011-02-03 06:41:42,172 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: master:60000-0x12dea40e4560000
Retrieved 112 byte(s) of data from znode /hbase/unassigned/da7cdcde7f10405c294144a35709f90f;
data=region=TestTable,0073456533,1296708597440.da7cdcde7f10405c294144a35709f90f., server=sv2borg181,60020,1296715280082,
state=RS_ZK_REGION_OPENED
2011-02-03 06:41:42,172 WARN org.apache.hadoop.hbase.master.AssignmentManager: Split report
has RIT node (shouldnt have one): REGION => {NAME => 'TestTable,0073456533,1296708597440.da7cdcde7f10405c294144a35709f90f.',
STARTKEY => '0073456533', ENDKEY => '0073472373', ENCODED => da7cdcde7f10405c294144a35709f90f,
TABLE => {{NAME => 'TestTable', FAMILIES => [{NAME => 'info', BLOOMFILTER =>
'NONE', REPLICATION_SCOPE => '0', VERSIONS => '3', COMPRESSION => 'NONE', TTL =>
'2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}}
node: region=TestTable,0073456533,1296708597440.da7cdcde7f10405c294144a35709f90f., server=sv2borg181,60020,1296715280082,
state=RS_ZK_REGION_OPENED
2011-02-03 06:41:46,305 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: master:60000-0x12dea40e4560000
Received ZooKeeper Event, type=NodeDataChanged, state=SyncConnected, path=/hbase/unassigned/da7cdcde7f10405c294144a35709f90f
2011-02-03 06:41:46,330 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: master:60000-0x12dea40e4560000
Retrieved 112 byte(s) of data from znode /hbase/unassigned/da7cdcde7f10405c294144a35709f90f
and set watcher; region=TestTable,0073456533,1296708597440.da7cdcde7f10405c294144a35709f90f.,
server=sv2borg181,60020,1296715280082, state=RS_ZK_REGION_OPENED
2011-02-03 06:41:46,330 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENED,
server=sv2borg181,60020,1296715280082, region=da7cdcde7f10405c294144a35709f90f
2011-02-03 06:41:46,330 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received OPENED
for region da7cdcde7f10405c294144a35709f90f from server sv2borg181,60020,1296715280082 but
region was in  the state null and not in expected PENDING_OPEN or OPENING states
2011-02-03 06:43:45,920 INFO org.apache.hadoop.hbase.catalog.MetaEditor: Deleted daughter
reference TestTable,0073456533,1296715299597.51d04ac711214779d429326cfc46f4df., qualifier=splitA,
from parent TestTable,0073456533,1296708597440.da7cdcde7f10405c294144a35709f90f.
2011-02-03 06:43:45,924 INFO org.apache.hadoop.hbase.catalog.MetaEditor: Deleted daughter
reference TestTable,0073464447,1296715299597.f15a72e22cc041653142e5cf2e788ee8., qualifier=splitB,
from parent TestTable,0073456533,1296708597440.da7cdcde7f10405c294144a35709f90f.
2011-02-03 06:43:45,924 DEBUG org.apache.hadoop.hbase.master.CatalogJanitor: Deleting region
TestTable,0073456533,1296708597440.da7cdcde7f10405c294144a35709f90f. because daughter splits
no longer hold references
2011-02-03 06:43:45,924 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: DELETING region
hdfs://sv2borg180:10000/hbase/TestTable/da7cdcde7f10405c294144a35709f90f
2011-02-03 06:43:45,957 INFO org.apache.hadoop.hbase.catalog.MetaEditor: Deleted region TestTable,0073456533,1296708597440.da7cdcde7f10405c294144a35709f90f.
from META
2011-02-03 07:30:51,115 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: master:60000-0x12dea40e4560000
Received ZooKeeper Event, type=NodeDeleted, state=SyncConnected, path=/hbase/unassigned/da7cdcde7f10405c294144a35709f90f
{code}


Workaround is deleting the items from zk.  Seems to work fine.

> Split message can come in before region opened message; results in 'Region has been PENDING_CLOSE
for too long' cycle
> ---------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-3368
>                 URL: https://issues.apache.org/jira/browse/HBASE-3368
>             Project: HBase
>          Issue Type: Bug
>            Reporter: stack
>            Assignee: stack
>            Priority: Blocker
>             Fix For: 0.90.0
>
>
> Another good one.  Look at these excerpts from master log:
> {code}
> 2010-12-16 00:49:45,749 INFO org.apache.hadoop.hbase.master.ServerManager: Received REGION_SPLIT:
TestTable,0078922610,1292373363753.490b382bae33642d12cd717b5785698b.: Daughters; TestTable,0078922610,1292460584999.c8b95dfc9a671083bafdaa0341279777.,
TestTable,0078933586,  
> 1292460584999.7cc636c9a7274eec4e784df2efebbca3. from XXX185,60020,1292460570976
> ....
> 2010-12-16 00:49:46,132 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler:
Opened region TestTable,0078922610,1292373363753.490b382bae33642d12cd717b5785698b. on XXX185,60020,1292460570976
> {code}
> ... so the split will have cleared the parent from in-memory data structures and then
the open handler will add them back (though region is offlined, split).
> Then the balancer runs....... only no one is holding the region thats being balanced.
> Over on XXX185 I see the open and then split at these times:
> {code}
> 2010-12-16 00:49:43,740 DEBUG org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler:
Opened TestTable,0078922610,1292373363753.490b382bae33642d12cd717b5785698b.
> .....
> 2010-12-16 00:49:45,003 INFO org.apache.hadoop.hbase.regionserver.SplitTransaction: Starting
split of region TestTable,0078922610,1292373363753.490b382bae33642d12cd717b5785698b.
> {code}
> So, the fact that it takes the Master a while to get around to the zk watcher processing
messes us up.
> Root problem is that we're using two different message buses, zk and then heartbeat.
 Intent is to do all over zk and remove hearbeat but looking at what to do for 0.90.0.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message