helix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steph Meslin-Weber <st...@tangency.co.uk>
Subject Re: NPE during start up
Date Mon, 16 Feb 2015 07:35:38 GMT
Hi Kishore,

That's right, the node doesn't process any state transitions. They should
have been logged in the first set of logs had they occurred.

Thanks,
Steph
On 16 Feb 2015 07:28, "kishore g" <g.kishore@gmail.com> wrote:

> Hi Steph,
>
> When the NPE occurs, do you get the state transition callbacks?
>
> thanks,
> Kishore G
>
>
>
> On Sun, Feb 15, 2015 at 11:23 PM, Steph Meslin-Weber <steph@tangency.co.uk
> > wrote:
>
>> Unfortunately it appears that when the NPE occurs,  dropping the
>> participant no longer cleans up the related INSTANCE node. Perhaps some
>> state is lost?
>>
>> Thanks,
>> Steph
>> On 16 Feb 2015 06:52, "Zhen Zhang" <nehzgnahz@gmail.com> wrote:
>>
>>> I think the NPE is not fatal. It happens when no message handler factory
>>> is registered for this message type. The message will not be removed and
>>> remain in UNREAD state. Later when the message handler factory is
>>> registered via:
>>> DefaultMessagingService#registerMessageHandlerFactory, we will send a
>>> NOP message, which will in turn trigger HelixTaskExecutor to process all
>>> UNREAD messages. We should definitely fix this by logging a warning message
>>> instead of throwing an NPE.
>>>
>>> Thanks,
>>> Jason
>>>
>>>
>>> On Sun, Feb 15, 2015 at 7:30 PM, kishore g <g.kishore@gmail.com> wrote:
>>>
>>>> Controller assuming the state transition occurred is even more
>>>> dangerous.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Sun, Feb 15, 2015 at 7:18 PM, vlad.gm@gmail.com <vlad.gm@gmail.com>
>>>> wrote:
>>>>
>>>>> In my experience it was fatal. The callback would jot be called but the
>>>>> controller would somehow assume the state transition occurred.
>>>>> On Feb 15, 2015 7:13 PM, "kishore g" <g.kishore@gmail.com> wrote:
>>>>>
>>>>> > Thanks Vlad. That explains the problem. That also explains how adding
>>>>> > sleep of 3seconds work.
>>>>> >
>>>>> > Jason, is this exception fatal?. Will the message be processed again
>>>>> after
>>>>> > the handler is added.
>>>>> >
>>>>> > thanks,
>>>>> > Kishore G
>>>>> >
>>>>> > On Sun, Feb 15, 2015 at 6:41 PM, vlad.gm@gmail.com <
>>>>> vlad.gm@gmail.com>
>>>>> > wrote:
>>>>> >
>>>>> >> https://issues.apache.org/jira/browse/HELIX-548
>>>>> >> On Feb 15, 2015 6:38 PM, "kishore g" <g.kishore@gmail.com>
wrote:
>>>>> >>
>>>>> >> > Hi Vlad,
>>>>> >> >
>>>>> >> > Was there any jira associated with it?
>>>>> >> >
>>>>> >> > thanks.
>>>>> >> > Kishore G
>>>>> >> >
>>>>> >> > On Sun, Feb 15, 2015 at 4:36 PM, vlad.gm@gmail.com <
>>>>> vlad.gm@gmail.com>
>>>>> >> > wrote:
>>>>> >> >
>>>>> >> >> Looks like the same problem we encountered recently.
>>>>> >> >>
>>>>> >> >> Regards,
>>>>> >> >> Vlad
>>>>> >> >> On Feb 15, 2015 4:35 PM, "kishore g" <g.kishore@gmail.com>
>>>>> wrote:
>>>>> >> >>
>>>>> >> >> > Steph described this problem on IRC.
>>>>> >> >> >
>>>>> >> >> > He is using 0.7.1. On connecting to cluster he
gets this NPE
>>>>> >> >> >
>>>>> >> >> > http://pastebin.com/YE3fwK5i
>>>>> >> >> >
>>>>> >> >> > java.lang.NullPointerException
>>>>> >> >> >         at
>>>>> >> >> >
>>>>> >> >>
>>>>> >>
>>>>> org.apache.helix.messaging.handling.HelixTaskExecutor.createMessageHandler(HelixTaskExecutor.java:661)
>>>>> >> >> >         at
>>>>> >> >> >
>>>>> >> >>
>>>>> >>
>>>>> org.apache.helix.messaging.handling.HelixTaskExecutor.onMessage(HelixTaskExecutor.java:581)
>>>>> >> >> >         at
>>>>> >> >> >
>>>>> >> >>
>>>>> >>
>>>>> org.apache.helix.manager.zk.ZkCallbackHandler.invoke(ZkCallbackHandler.java:202)
>>>>> >> >> >         at
>>>>> >> >> >
>>>>> >> >>
>>>>> >>
>>>>> org.apache.helix.manager.zk.ZkCallbackHandler.init(ZkCallbackHandler.java:336)
>>>>> >> >> >         at
>>>>> >> >> >
>>>>> >> >>
>>>>> >>
>>>>> org.apache.helix.manager.zk.ZkCallbackHandler.<init>(ZkCallbackHandler.java:130)
>>>>> >> >> >         at
>>>>> >> >> >
>>>>> >> >>
>>>>> >>
>>>>> org.apache.helix.manager.zk.ZkHelixConnection.addListener(ZkHelixConnection.java:533)
>>>>> >> >> >         at
>>>>> >> >> >
>>>>> >> >>
>>>>> >>
>>>>> org.apache.helix.manager.zk.ZkHelixConnection.addMessageListener(ZkHelixConnection.java:267)
>>>>> >> >> >         at
>>>>> >> >> >
>>>>> >> >>
>>>>> >>
>>>>> org.apache.helix.manager.zk.ZkHelixParticipant.setupMsgHandler(ZkHelixParticipant.java:347)
>>>>> >> >> >         at
>>>>> >> >> >
>>>>> >> >>
>>>>> >>
>>>>> org.apache.helix.manager.zk.ZkHelixParticipant.init(ZkHelixParticipant.java:383)
>>>>> >> >> >         at
>>>>> >> >> >
>>>>> >> >>
>>>>> >>
>>>>> org.apache.helix.manager.zk.ZkHelixParticipant.onConnected(ZkHelixParticipant.java:401)
>>>>> >> >> >         at
>>>>> >> >> >
>>>>> >> >>
>>>>> >>
>>>>> org.apache.helix.manager.zk.ZkHelixParticipant.start(ZkHelixParticipant.java:428)
>>>>> >> >> >         at
>>>>> >> >> >
>>>>> >> >>
>>>>> >>
>>>>> com.example.ProtostuffServerNode.spinUpParticipant(ProtostuffServerNode.java:134)
>>>>> >> >> >
>>>>> >> >> >
>>>>> >> >> > Here is his connection code.
>>>>> >> >> >
>>>>> >> >> > http://pastebin.com/QRfVU1tc
>>>>> >> >> >
>>>>> >> >> > private static HelixParticipant spinUpParticipant(HelixAdmin
>>>>> admin,
>>>>> >> >> > ParticipantId participantId) {
>>>>> >> >> >                 LOGGER.info("Starting up "+participantId);
>>>>> >> >> >                 HelixConnection connection = new
>>>>> ZkHelixConnection(
>>>>> >> >> > ZK_ADDRESS);
>>>>> >> >> >                 connection.connect();
>>>>> >> >> >                 HelixParticipant participant =
connection.
>>>>> >> >> > createParticipant(CLUSTER_ID, participantId);
>>>>> >> >> >                 StateMachineEngine stateMach =
participant.
>>>>> >> >> > getStateMachineEngine();
>>>>> >> >> >
>>>>> >> >> >
>>>>>  StateTransitionHandlerFactory<LocalTransitionHandler>
>>>>> >> >> > transitionHandlerFactory = new OnlineOfflineHandlerFactory();
>>>>> >> >> >
>>>>>  stateMach.registerStateModelFactory(STATE_MODEL_NAME,
>>>>> >> >> > transitionHandlerFactory);
>>>>> >> >> >                 participant.start();
>>>>> >> >> >
>>>>> >> >> >                 admin.enableInstance(CLUSTER_NAME,
>>>>> >> >> participantId.toString(
>>>>> >> >> > ), true);
>>>>> >> >> >
>>>>> >> >> >                 return participant;
>>>>> >> >> >         }
>>>>> >> >> >
>>>>> >> >> > Adding 3s sleep after registerStateModelFactory
works. Any
>>>>> idea what
>>>>> >> is
>>>>> >> >> > happening.
>>>>> >> >> >
>>>>> >> >> > thanks,
>>>>> >> >> > Kishore G
>>>>> >> >> >
>>>>> >> >> >
>>>>> >> >> >
>>>>> >> >> >
>>>>> >> >>
>>>>> >> >
>>>>> >> >
>>>>> >>
>>>>> >
>>>>> >
>>>>>
>>>>
>>>>
>>>
>

Mime
View raw message