hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sergey Shelukhin (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-7268) correct local region location cache information can be overwritten w/stale information from an old server
Date Wed, 05 Dec 2012 21:33:58 GMT

    [ https://issues.apache.org/jira/browse/HBASE-7268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13510798#comment-13510798
] 

Sergey Shelukhin commented on HBASE-7268:
-----------------------------------------

ok, I got repro... will attach patch after cleanup of debug logging/etc. 
I'd prefer to have TS in meta but this is a simpler fix for now.
The logging with patch looks like this:
{code}
2012-12-05 12:06:08,285 DEBUG [Thread-521] util.ChaosMonkey$Action(203): Removing 13 regions
from 10.10.11.17,53406,1354737903944
...
2012-12-05 12:06:08,765 INFO  [am-zkevent-worker-pool-2-thread-2] master.RegionStates(249):
Region {NAME =&gt; &apos;IntegrationTestRebalanceAndKillServersTargeted,7ffffff8,1354737916774.89483778064d05b1f2e1c0d20bcabc16.&apos;,
STARTKEY =&gt; &apos;7ffffff8&apos;, ENDKEY =&gt; &apos;8cccccc4&apos;,
ENCODED =&gt; 89483778064d05b1f2e1c0d20bcabc16,} transitioned from {IntegrationTestRebalanceAndKillServersTargeted,7ffffff8,1354737916774.89483778064d05b1f2e1c0d20bcabc16.
state=PENDING_OPEN, ts=1354737968742, server=10.10.11.17,53407,1354737903960} to {IntegrationTestRebalanceAndKillServersTargeted,7ffffff8,1354737916774.89483778064d05b1f2e1c0d20bcabc16.
state=OPENING, ts=1354737968765, server=10.10.11.17,53407,1354737903960}
...
2012-12-05 12:06:10,549 INFO  [Thread-521] util.ChaosMonkey$Action(179): Killing region server:10.10.11.17,53407,1354737903960
...
2012-12-05 12:06:39,233 INFO  [am-zkevent-worker-pool-2-thread-2] master.RegionStates(249):
Region {NAME =&gt; &apos;IntegrationTestRebalanceAndKillServersTargeted,7ffffff8,1354737916774.89483778064d05b1f2e1c0d20bcabc16.&apos;,
STARTKEY =&gt; &apos;7ffffff8&apos;, ENDKEY =&gt; &apos;8cccccc4&apos;,
ENCODED =&gt; 89483778064d05b1f2e1c0d20bcabc16,} transitioned from {IntegrationTestRebalanceAndKillServersTargeted,7ffffff8,1354737916774.89483778064d05b1f2e1c0d20bcabc16.
state=OPENING, ts=1354737999228, server=10.10.11.17,53404,1354737903902} to {IntegrationTestRebalanceAndKillServersTargeted,7ffffff8,1354737916774.89483778064d05b1f2e1c0d20bcabc16.
state=OPEN, ts=1354737999232, server=10.10.11.17,53404,1354737903902}
...
2012-12-05 12:06:40,276 INFO  [HBaseWriterThread_4] client.HConnectionManager$HConnectionImplementation(1776):
Received an error from 10.10.11.17:53407 for region IntegrationTestRebalanceAndKillServersTargeted,7ffffff8,1354737916774.89483778064d05b1f2e1c0d20bcabc16.;
not removing 10.10.11.17:53404 from cache.
...
2012-12-05 12:06:40,381 INFO  [HBaseWriterThread_15] client.HConnectionManager$HConnectionImplementation(1809):
Region IntegrationTestRebalanceAndKillServersTargeted,7ffffff8,1354737916774.89483778064d05b1f2e1c0d20bcabc16.
moved to 10.10.11.17:53407 according to 10.10.11.17:53406
2012-12-05 12:06:40,381 DEBUG [HBaseWriterThread_15] client.HConnectionManager$HConnectionImplementation(1342):
Ignoring stale location update for IntegrationTestRebalanceAndKillServersTargeted,7ffffff8,1354737916774.89483778064d05b1f2e1c0d20bcabc16.:
10.10.11.17:53407 at 1354737968725; local 10.10.11.17:53404 at 1354738000265
{code}
                
> correct local region location cache information can be overwritten w/stale information
from an old server
> ---------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-7268
>                 URL: https://issues.apache.org/jira/browse/HBASE-7268
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.96.0
>            Reporter: Sergey Shelukhin
>            Assignee: Sergey Shelukhin
>            Priority: Minor
>
> Discovered via HBASE-7250; related to HBASE-5877.
> Test is writing from multiple threads.
> Server A has region R; client knows that.
> R gets moved from A to server B.
> B gets killed.
> R gets moved by master to server C.
> ~15 seconds later, client tries to write to it (on A?).
> Multiple client threads report from RegionMoved exception processing logic "R moved from
C to B", even though such transition never happened (neither in nor before the sequence described
below). Not quite sure how the client learned of the transition to C, I assume it's from meta
from some other thread...
> Then, put fails (it may fail due to accumulated errors that are not logged, which I am
investigating... but the bogus cache update is there nonwithstanding).
> I have a patch but not sure if it works, test still fails locally for yet unknown reason.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message