hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ted Yu (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-3671) Split report before we finish parent region open; workaround till 0.92; Race between split and OPENED processing
Date Fri, 18 Mar 2011 18:40:30 GMT

    [ https://issues.apache.org/jira/browse/HBASE-3671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13008567#comment-13008567
] 

Ted Yu commented on HBASE-3671:
-------------------------------

Minor comment:
{code}
+      LOG.warn("Skipping the onining of " + regionInfo.getRegionNameAsString() +
+        " because regions is NOT in RIT -- presuming this is because it SPLIT");
{code}
should be:
{code}
+      LOG.warn("Skipping the onlining of " + regionInfo.getRegionNameAsString() +
+        " because region is NOT in RIT -- presuming this is because it SPLIT");
{code}


> Split report before we finish parent region open; workaround till 0.92; Race between
split and OPENED processing
> ----------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-3671
>                 URL: https://issues.apache.org/jira/browse/HBASE-3671
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.2
>            Reporter: stack
>         Attachments: 3671.txt
>
>
> This issue is about adding a workaround to 0.90 until we get proper fix in 0.92 (HBASE-3559).
> Here is the sequence of events:
> 1. We start to process OPENED region event.
> 2. We receive a SPLIT of this region report.
> 3. SPLIT processing offline the region and onlines daughters.
> 4. Metascanner runs and clears out the region from .META. deleting it
> 5. The OPENED handler runs.  Marks the region online in Master memory.
> 6. Balancer runs.  Trys to balance a region that has been deleted.
> Loops for ever.
> Here is excerpt from logs.  It happened during startup, lots going on.  Could happen
on regionserver crash I suppose, maybe, but we're susceptible during cluster start:
> {code}
> # We assign the region
> 2011-03-16 15:18:29,053 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x22e286f0b9c98f1
Async create of unassigned node for 3516b74d0c9d4458c2f2f715249e3f78 with OFFLINE state
> ...
> 2011-03-16 15:18:32,298 DEBUG org.apache.hadoop.hbase.master.AssignmentManager$CreateUnassignedAsyncCallback:
rs=tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78.
state=OFFLINE, ts=1300313909053, server=sv4borg39,60020,1300313564807
> ...
> 2011-03-16 15:18:32,732 DEBUG org.apache.hadoop.hbase.master.AssignmentManager$ExistsUnassignedAsyncCallback:
rs=tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78.
state=OFFLINE, ts=1300313909053
> ...
> 2011-03-16 15:23:02,114 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: master:60000-0x22e286f0b9c98f1
Received ZooKeeper Event, type=NodeDataChanged, state=SyncConnected, path=/prodjobs/unassigned/3516b74d0c9d4458c2f2f715249e3f78
> ...
> 2011-03-16 15:23:02,183 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: master:60000-0x22e286f0b9c98f1
Retrieved 127 byte(s) of data from znode /prodjobs/unassigned/3516b74d0c9d4458c2f2f715249e3f78
and set watcher; region=tsdb,^@^D2McZ@^@^@^A^@^@G^@^@^L^@^@f^@^@^U^@^@�^@^@(^@^C^G,1299401073466.3516b74d0c9d4458c2f2f715249e3f78.,
server=sv4borg39,60020,1300313564807, state=RS_ZK_REGION_OPENED
> 2011-03-16 15:23:02,183 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling
transition=RS_ZK_REGION_OPENED, server=sv4borg39,60020,1300313564807, region=3516b74d0c9d4458c2f2f715249e3f78
> # At this point we've queued an Excecutor to run to process the OPENED event.  Now in
comes the SPLIT.
> 2011-03-16 15:23:18,199 INFO org.apache.hadoop.hbase.master.ServerManager: Received REGION_SPLIT:
tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78.:
Daughters; tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1300314189812.74c51400bb8dfa127fadfd11a04d72f2.,
tsdb,\x00\x042MmD\x88\x00\x00\x01\x00\x00S\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x029\x00\x00(\x00\x03\x03,1300314189812.87b061739a11d0f9d02acfb92ef961a2.
from sv4borg39,60020,1300313564807
> 2011-03-16 15:23:18,870 WARN org.apache.hadoop.hbase.master.AssignmentManager: Split
report has RIT node (shouldnt have one): REGION => {NAME => 'tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78.',
STARTKEY => '\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07',
ENDKEY => '\x00\x043L\xE7\xF50\x00\x00\x01\x00\x00I\x00\x00\x0C\x00\x00f\x00\x00\x0E\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x02u',
ENCODED => 3516b74d0c9d4458c2f2f715249e3f78, TABLE => {{NAME => 'tsdb', FAMILIES
=> [{NAME => 't', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', VERSIONS =>
'3', COMPRESSION => 'LZO', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY =>
'false', BLOCKCACHE => 'true'}]}} node: region=tsdb,^@^D2McZ@^@^@^A^@^@G^@^@^L^@^@f^@^@^U^@^@�^@^@(^@^C^G,1299401073466.3516b74d0c9d4458c2f2f715249e3f78.,
server=sv4borg39,60020,1300313564807, state=RS_ZK_REGION_OPENED
> # Now metascanner runs and actually removes the parent region, deleting it all
> 2011-03-16 15:28:34,352 INFO org.apache.hadoop.hbase.catalog.MetaEditor: Deleted daughter
reference tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1300314189812.74c51400bb8dfa127fadfd11a04d72f2.,
qualifier=splitA, from parent tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78.
> 2011-03-16 15:28:34,356 INFO org.apache.hadoop.hbase.catalog.MetaEditor: Deleted daughter
reference tsdb,\x00\x042MmD\x88\x00\x00\x01\x00\x00S\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x029\x00\x00(\x00\x03\x03,1300314189812.87b061739a11d0f9d02acfb92ef961a2.,
qualifier=splitB, from parent tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78.
> 2011-03-16 15:28:34,356 DEBUG org.apache.hadoop.hbase.master.CatalogJanitor: Deleting
region tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78.
because daughter splits no longer hold references
> 2011-03-16 15:28:34,356 DEBUG org.apache.hadoop.hbase.master.CatalogJanitor: Deleting
region tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78.
because daughter splits no longer hold references
> 2011-03-16 15:28:34,444 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: DELETING
region hdfs://sv4borg29:9000/hbase/tsdb/3516b74d0c9d4458c2f2f715249e3f78
> 2011-03-16 15:28:34,542 INFO org.apache.hadoop.hbase.catalog.MetaEditor: Deleted region
tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78.
from META
> # Now the OPENED executor runs, a good while after the above
> #
> 2011-03-16 15:30:26,679 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler:
Handling OPENED event for 3516b74d0c9d4458c2f2f715249e3f78; deleting unassigned node
> 2011-03-16 15:30:26,679 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x22e286f0b9c98f1
Deleting existing unassigned node for 3516b74d0c9d4458c2f2f715249e3f78 that is in expected
state RS_ZK_REGION_OPENED
> 2011-03-16 15:30:26,725 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: master:60000-0x22e286f0b9c98f1
Retrieved 127 byte(s) of data from znode /prodjobs/unassigned/3516b74d0c9d4458c2f2f715249e3f78;
data=region=tsdb,^@^D2McZ@^@^@^A^@^@G^@^@^L^@^@f^@^@^U^@^@�^@^@(^@^C^G,1299401073466.3516b74d0c9d4458c2f2f715249e3f78.,
server=sv4borg39,60020,1300313564807, state=RS_ZK_REGION_OPENED
> 2011-03-16 15:30:26,875 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x22e286f0b9c98f1
Successfully deleted unassigned node for region 3516b74d0c9d4458c2f2f715249e3f78 in expected
state RS_ZK_REGION_OPENED
> 2011-03-16 15:30:27,051 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: master:60000-0x22e286f0b9c98f1
Received ZooKeeper Event, type=NodeDeleted, state=SyncConnected, path=/prodjobs/unassigned/3516b74d0c9d4458c2f2f715249e3f78
> 2011-03-16 15:30:27,051 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler:
Opened region tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78.
on sv4borg39,60020,1300313564807
> # Now we have a region online in Master's memory but its not out in .META. nor in the
FS.
> # The balancer runs
> 2011-03-16 23:18:41,716 INFO org.apache.hadoop.hbase.master.HMaster: balance hri=tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78.,
src=sv4borg39,60020,1300313564807, dest=sv4borg33,60020,1300342574666
> 2011-03-16 23:18:41,716 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Starting
unassignment of region tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78.
(offlining)
> 2011-03-16 23:18:41,718 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Server
serverName=sv4borg39,60020,1300313564807, load=(requests=2, regions=504, usedHeap=929, maxHeap=6973)
returned org.apache.hadoop.hbase.NotServingRegionException: org.apache.hadoop.hbase.NotServingRegionException:
Received close for tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78.
but we are not serving it for 3516b74d0c9d4458c2f2f715249e3f78
> 2011-03-16 23:20:34,436 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions
in transition timed out:  tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78.
state=PENDING_CLOSE, ts=1300342802734
> 2011-03-16 23:20:34,437 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region
has been PENDING_CLOSE for too long, running forced unassign again on region=tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78.
> 2011-03-16 23:20:34,437 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: master:60000-0x22e286f0b9c98f1
Set watcher on existing znode /prodjobs/unassigned/3516b74d0c9d4458c2f2f715249e3f78
> 2011-03-16 23:20:34,437 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Starting
unassignment of region tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78.
(offlining)
> 2011-03-16 23:20:34,438 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Attempting
to unassign region tsdb,\x00\x042McZ@\x00\x00\x01\x00\x00G\x00\x00\x0C\x00\x00f\x00\x00\x15\x00\x00\xA9\x00\x00(\x00\x03\x07,1299401073466.3516b74d0c9d4458c2f2f715249e3f78.
which is already pending close but forcing an additional close
> {code}
> Ad infinitum

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message