helix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From kishore g <g.kish...@gmail.com>
Subject Re: Helix issue - External View out of sync
Date Tue, 18 Nov 2014 00:24:44 GMT
One suggestion is to check for GC pauses on the nodes. Nodes loses the
cluster member ship if they get into long GC or starts flapping. That might
be cause for state mismatch. However, external view must be up to date. It
might help if you can attach the controller logs and node logs.

On Mon, Nov 17, 2014 at 4:10 PM, Varun Sharma <varun@pinterest.com> wrote:

> Hi,
>
> I am seeing the following issue for many partitions in helix using a
> simple Online->Offline state model factory. The external view says that the
> partition has been assigned to 3 hosts. However, when I look at the hosts
> only 1 of them executed the OFFLINE --> ONLINE transition.
>
> On the hosts, that did not execute the transition, I see the following:
>
> 2014-11-13 09:29:54,394 [pool-3-thread-11]
> (HelixStateTransitionHandler.java:206) WARN  *Force CurrentState on Zk to
> be stateModel's CurrentState*. *partitionKey: 490*, currentState: ONLINE,
> message: 12690ce8-8098-46b1-a93d-279604f0e3db,
> {CREATE_TIMESTAMP=1415870993349, ClusterEventName=idealStateChange,
> EXECUTE_START_TIMESTAMP=1415870994382, EXE_SESSION_ID=149a14ada0d0013,
> FROM_STATE=OFFLINE, MSG_ID=*12690ce8-8098-46b1-a93d-279604f0e3db*,
> MSG_STATE=read, MSG_TYPE=STATE_TRANSITION, PARTITION_NAME=490,
> READ_TIMESTAMP=1415870993787,
> RESOURCE_NAME=$terrapin$data$meta_pin_join$1415866960201,
> SRC_NAME=hdfsterrapin-a-namenode001_9090, SRC_SESSION_ID=147a7beb2dd8ed7,
> STATE_MODEL_DEF=OnlineOffline, STATE_MODEL_FACTORY_NAME=DEFAULT,
> TGT_NAME=hdfsterrapin-a-datanode-ba3ad256, TGT_SESSION_ID=149a14ada0d0013,
> TO_STATE=ONLINE}{}{}
>
> When I grep the message ID in the controller, I see the following:
>
> 2014-11-14 09:34:56,265 [StatusDumpTimerTask]
> (ZKPathDataDumpTask.java:155) INFO  {
>
>   "id" : "149a14ada0d0013__$terrapin$data$meta_pin_join$1415866960201",
>
>   "mapFields" : {
>
>     "HELIX_ERROR     20141113-092954.000419 STATE_TRANSITION
> c1193025-b416-49d7-adc2-10afe2389141" : {
>
>       "AdditionalInfo" : "Message execution failed. msgId:
> 12690ce8-8098-46b1-a93d-279604f0e3db, errorMsg:
> org.apache.helix.messaging.handling.
> *HelixStateTransitionHandler$HelixStateMismatchException*: Current state
> of stateModel does not match the fromState in Message, Current
> State:ONLINE, message expected:OFFLINE, partition: 490, from:
> hdfsterrapin-a-namenode001_9090, to: hdfsterrapin-a-datanode-ba3ad256",
>
>       "Class" : "class
> org.apache.helix.messaging.handling.HelixStateTransitionHandler",
>
>       "MSG_ID" : "12690ce8-8098-46b1-a93d-279604f0e3db",
>
>       "Message state" : "READ"
>
>     },
>
>
> What could be causing this - when I restart the node, the error disappears
> (meaning that the node is able to perform the state transition). What could
> be causing this state mismatch ?
>
>
> Thanks
>
> Varun
>

Mime
View raw message