helix-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zhen Zhang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HELIX-551) External view & partition states go out of sync
Date Wed, 19 Nov 2014 01:35:33 GMT

    [ https://issues.apache.org/jira/browse/HELIX-551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14217232#comment-14217232
] 

Zhen Zhang commented on HELIX-551:
----------------------------------

this is related to https://issues.apache.org/jira/browse/HELIX-552

> External view & partition states go out of sync
> -----------------------------------------------
>
>                 Key: HELIX-551
>                 URL: https://issues.apache.org/jira/browse/HELIX-551
>             Project: Apache Helix
>          Issue Type: Bug
>    Affects Versions: 0.6.4
>            Reporter: Varun Sharma
>
> Hi,
> I am seeing the following issue for many partitions in helix using a simple Online->Offline
state model factory. The external view says that the partition has been assigned to 3 hosts.
However, when I look at the hosts only 1 of them executed the OFFLINE --> ONLINE transition.
> On the hosts, that did not execute the transition, I see the following:
> 2014-11-13 09:29:54,394 [pool-3-thread-11] (HelixStateTransitionHandler.java:206) WARN
 Force CurrentState on Zk to be stateModel's CurrentState. partitionKey: 490, currentState:
ONLINE, message: 12690ce8-8098-46b1-a93d-279604f0e3db, {CREATE_TIMESTAMP=1415870993349, ClusterEventName=idealStateChange,
EXECUTE_START_TIMESTAMP=1415870994382, EXE_SESSION_ID=149a14ada0d0013, FROM_STATE=OFFLINE,
MSG_ID=12690ce8-8098-46b1-a93d-279604f0e3db, MSG_STATE=read, MSG_TYPE=STATE_TRANSITION, PARTITION_NAME=490,
READ_TIMESTAMP=1415870993787, RESOURCE_NAME=$terrapin$data$meta_pin_join$1415866960201, SRC_NAME=hdfsterrapin-a-namenode001_9090,
SRC_SESSION_ID=147a7beb2dd8ed7, STATE_MODEL_DEF=OnlineOffline, STATE_MODEL_FACTORY_NAME=DEFAULT,
TGT_NAME=hdfsterrapin-a-datanode-ba3ad256, TGT_SESSION_ID=149a14ada0d0013, TO_STATE=ONLINE}{}{}

> When I grep the message ID in the controller, I see the following:
> 2014-11-14 09:34:56,265 [StatusDumpTimerTask] (ZKPathDataDumpTask.java:155) INFO  {
>   "id" : "149a14ada0d0013__$terrapin$data$meta_pin_join$1415866960201",
>   "mapFields" : {
>     "HELIX_ERROR     20141113-092954.000419 STATE_TRANSITION c1193025-b416-49d7-adc2-10afe2389141"
: {
>       "AdditionalInfo" : "Message execution failed. msgId: 12690ce8-8098-46b1-a93d-279604f0e3db,
errorMsg: org.apache.helix.messaging.handling.HelixStateTransitionHandler$HelixStateMismatchException:
Current state of stateModel does not match the fromState in Message, Current State:ONLINE,
message expected:OFFLINE, partition: 490, from: hdfsterrapin-a-namenode001_9090, to: hdfsterrapin-a-datanode-ba3ad256",
>       "Class" : "class org.apache.helix.messaging.handling.HelixStateTransitionHandler",
>       "MSG_ID" : "12690ce8-8098-46b1-a93d-279604f0e3db",
>       "Message state" : "READ"
>     },
> What could be causing this - when I restart the node, the error disappears (meaning that
the node is able to perform the state transition). What could be causing this state mismatch
?
> Thanks



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message