helix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Zhen Zhang <nehzgn...@gmail.com>
Subject Re: NPE during start up
Date Mon, 16 Feb 2015 07:45:42 GMT
might be some race conditions. need to double check this.
On Feb 15, 2015 11:38 PM, "Steph Meslin-Weber" <steph@tangency.co.uk> wrote:

> Hi Kishore,
>
> That's right, the node doesn't process any state transitions. They should
> have been logged in the first set of logs had they occurred.
>
> Thanks,
> Steph
> On 16 Feb 2015 07:28, "kishore g" <g.kishore@gmail.com> wrote:
>
>> Hi Steph,
>>
>> When the NPE occurs, do you get the state transition callbacks?
>>
>> thanks,
>> Kishore G
>>
>>
>>
>> On Sun, Feb 15, 2015 at 11:23 PM, Steph Meslin-Weber <
>> steph@tangency.co.uk> wrote:
>>
>>> Unfortunately it appears that when the NPE occurs,  dropping the
>>> participant no longer cleans up the related INSTANCE node. Perhaps some
>>> state is lost?
>>>
>>> Thanks,
>>> Steph
>>> On 16 Feb 2015 06:52, "Zhen Zhang" <nehzgnahz@gmail.com> wrote:
>>>
>>>> I think the NPE is not fatal. It happens when no message handler
>>>> factory is registered for this message type. The message will not be
>>>> removed and remain in UNREAD state. Later when the message handler factory
>>>> is registered via:
>>>> DefaultMessagingService#registerMessageHandlerFactory, we will send a
>>>> NOP message, which will in turn trigger HelixTaskExecutor to process all
>>>> UNREAD messages. We should definitely fix this by logging a warning message
>>>> instead of throwing an NPE.
>>>>
>>>> Thanks,
>>>> Jason
>>>>
>>>>
>>>> On Sun, Feb 15, 2015 at 7:30 PM, kishore g <g.kishore@gmail.com> wrote:
>>>>
>>>>> Controller assuming the state transition occurred is even more
>>>>> dangerous.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Sun, Feb 15, 2015 at 7:18 PM, vlad.gm@gmail.com <vlad.gm@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> In my experience it was fatal. The callback would jot be called but
>>>>>> the
>>>>>> controller would somehow assume the state transition occurred.
>>>>>> On Feb 15, 2015 7:13 PM, "kishore g" <g.kishore@gmail.com>
wrote:
>>>>>>
>>>>>> > Thanks Vlad. That explains the problem. That also explains how
>>>>>> adding
>>>>>> > sleep of 3seconds work.
>>>>>> >
>>>>>> > Jason, is this exception fatal?. Will the message be processed
>>>>>> again after
>>>>>> > the handler is added.
>>>>>> >
>>>>>> > thanks,
>>>>>> > Kishore G
>>>>>> >
>>>>>> > On Sun, Feb 15, 2015 at 6:41 PM, vlad.gm@gmail.com <
>>>>>> vlad.gm@gmail.com>
>>>>>> > wrote:
>>>>>> >
>>>>>> >> https://issues.apache.org/jira/browse/HELIX-548
>>>>>> >> On Feb 15, 2015 6:38 PM, "kishore g" <g.kishore@gmail.com>
wrote:
>>>>>> >>
>>>>>> >> > Hi Vlad,
>>>>>> >> >
>>>>>> >> > Was there any jira associated with it?
>>>>>> >> >
>>>>>> >> > thanks.
>>>>>> >> > Kishore G
>>>>>> >> >
>>>>>> >> > On Sun, Feb 15, 2015 at 4:36 PM, vlad.gm@gmail.com
<
>>>>>> vlad.gm@gmail.com>
>>>>>> >> > wrote:
>>>>>> >> >
>>>>>> >> >> Looks like the same problem we encountered recently.
>>>>>> >> >>
>>>>>> >> >> Regards,
>>>>>> >> >> Vlad
>>>>>> >> >> On Feb 15, 2015 4:35 PM, "kishore g" <g.kishore@gmail.com>
>>>>>> wrote:
>>>>>> >> >>
>>>>>> >> >> > Steph described this problem on IRC.
>>>>>> >> >> >
>>>>>> >> >> > He is using 0.7.1. On connecting to cluster
he gets this NPE
>>>>>> >> >> >
>>>>>> >> >> > http://pastebin.com/YE3fwK5i
>>>>>> >> >> >
>>>>>> >> >> > java.lang.NullPointerException
>>>>>> >> >> >         at
>>>>>> >> >> >
>>>>>> >> >>
>>>>>> >>
>>>>>> org.apache.helix.messaging.handling.HelixTaskExecutor.createMessageHandler(HelixTaskExecutor.java:661)
>>>>>> >> >> >         at
>>>>>> >> >> >
>>>>>> >> >>
>>>>>> >>
>>>>>> org.apache.helix.messaging.handling.HelixTaskExecutor.onMessage(HelixTaskExecutor.java:581)
>>>>>> >> >> >         at
>>>>>> >> >> >
>>>>>> >> >>
>>>>>> >>
>>>>>> org.apache.helix.manager.zk.ZkCallbackHandler.invoke(ZkCallbackHandler.java:202)
>>>>>> >> >> >         at
>>>>>> >> >> >
>>>>>> >> >>
>>>>>> >>
>>>>>> org.apache.helix.manager.zk.ZkCallbackHandler.init(ZkCallbackHandler.java:336)
>>>>>> >> >> >         at
>>>>>> >> >> >
>>>>>> >> >>
>>>>>> >>
>>>>>> org.apache.helix.manager.zk.ZkCallbackHandler.<init>(ZkCallbackHandler.java:130)
>>>>>> >> >> >         at
>>>>>> >> >> >
>>>>>> >> >>
>>>>>> >>
>>>>>> org.apache.helix.manager.zk.ZkHelixConnection.addListener(ZkHelixConnection.java:533)
>>>>>> >> >> >         at
>>>>>> >> >> >
>>>>>> >> >>
>>>>>> >>
>>>>>> org.apache.helix.manager.zk.ZkHelixConnection.addMessageListener(ZkHelixConnection.java:267)
>>>>>> >> >> >         at
>>>>>> >> >> >
>>>>>> >> >>
>>>>>> >>
>>>>>> org.apache.helix.manager.zk.ZkHelixParticipant.setupMsgHandler(ZkHelixParticipant.java:347)
>>>>>> >> >> >         at
>>>>>> >> >> >
>>>>>> >> >>
>>>>>> >>
>>>>>> org.apache.helix.manager.zk.ZkHelixParticipant.init(ZkHelixParticipant.java:383)
>>>>>> >> >> >         at
>>>>>> >> >> >
>>>>>> >> >>
>>>>>> >>
>>>>>> org.apache.helix.manager.zk.ZkHelixParticipant.onConnected(ZkHelixParticipant.java:401)
>>>>>> >> >> >         at
>>>>>> >> >> >
>>>>>> >> >>
>>>>>> >>
>>>>>> org.apache.helix.manager.zk.ZkHelixParticipant.start(ZkHelixParticipant.java:428)
>>>>>> >> >> >         at
>>>>>> >> >> >
>>>>>> >> >>
>>>>>> >>
>>>>>> com.example.ProtostuffServerNode.spinUpParticipant(ProtostuffServerNode.java:134)
>>>>>> >> >> >
>>>>>> >> >> >
>>>>>> >> >> > Here is his connection code.
>>>>>> >> >> >
>>>>>> >> >> > http://pastebin.com/QRfVU1tc
>>>>>> >> >> >
>>>>>> >> >> > private static HelixParticipant spinUpParticipant(HelixAdmin
>>>>>> admin,
>>>>>> >> >> > ParticipantId participantId) {
>>>>>> >> >> >                 LOGGER.info("Starting up "+participantId);
>>>>>> >> >> >                 HelixConnection connection
= new
>>>>>> ZkHelixConnection(
>>>>>> >> >> > ZK_ADDRESS);
>>>>>> >> >> >                 connection.connect();
>>>>>> >> >> >                 HelixParticipant participant
= connection.
>>>>>> >> >> > createParticipant(CLUSTER_ID, participantId);
>>>>>> >> >> >                 StateMachineEngine stateMach
= participant.
>>>>>> >> >> > getStateMachineEngine();
>>>>>> >> >> >
>>>>>> >> >> >
>>>>>>  StateTransitionHandlerFactory<LocalTransitionHandler>
>>>>>> >> >> > transitionHandlerFactory = new OnlineOfflineHandlerFactory();
>>>>>> >> >> >
>>>>>>  stateMach.registerStateModelFactory(STATE_MODEL_NAME,
>>>>>> >> >> > transitionHandlerFactory);
>>>>>> >> >> >                 participant.start();
>>>>>> >> >> >
>>>>>> >> >> >                 admin.enableInstance(CLUSTER_NAME,
>>>>>> >> >> participantId.toString(
>>>>>> >> >> > ), true);
>>>>>> >> >> >
>>>>>> >> >> >                 return participant;
>>>>>> >> >> >         }
>>>>>> >> >> >
>>>>>> >> >> > Adding 3s sleep after registerStateModelFactory
works. Any
>>>>>> idea what
>>>>>> >> is
>>>>>> >> >> > happening.
>>>>>> >> >> >
>>>>>> >> >> > thanks,
>>>>>> >> >> > Kishore G
>>>>>> >> >> >
>>>>>> >> >> >
>>>>>> >> >> >
>>>>>> >> >> >
>>>>>> >> >>
>>>>>> >> >
>>>>>> >> >
>>>>>> >>
>>>>>> >
>>>>>> >
>>>>>>
>>>>>
>>>>>
>>>>
>>

Mime
View raw message