helix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From kishore g <g.kish...@gmail.com>
Subject Re: NPE during start up
Date Mon, 16 Feb 2015 19:36:16 GMT
Is there any work around for this and is this fatal as Vlad mentioned?

On Mon, Feb 16, 2015 at 10:28 AM, Zhen Zhang <nehzgnahz@gmail.com> wrote:

> There is a timing issue in ZkHelixParticipant#setupMsgHandler(). We should
> hook up ZK callback (line 347 in
> https://github.com/apache/helix/blob/master/helix-core/src/main/java/org/apache/helix/manager/zk/ZkHelixParticipant.java)
> after all message handler registrations are done (line 354 in
> https://github.com/apache/helix/blob/master/helix-core/src/main/java/org/apache/helix/manager/zk/ZkHelixParticipant.java).
> Fix is to move adding ZK callback to the end. Will add a test case that can
> reliably reproduce this issue.
>
> Thanks,
> Zhen
>
>
> On Sun, Feb 15, 2015 at 11:45 PM, Zhen Zhang <nehzgnahz@gmail.com> wrote:
>
>> might be some race conditions. need to double check this.
>> On Feb 15, 2015 11:38 PM, "Steph Meslin-Weber" <steph@tangency.co.uk>
>> wrote:
>>
>>> Hi Kishore,
>>>
>>> That's right, the node doesn't process any state transitions. They
>>> should have been logged in the first set of logs had they occurred.
>>>
>>> Thanks,
>>> Steph
>>> On 16 Feb 2015 07:28, "kishore g" <g.kishore@gmail.com> wrote:
>>>
>>>> Hi Steph,
>>>>
>>>> When the NPE occurs, do you get the state transition callbacks?
>>>>
>>>> thanks,
>>>> Kishore G
>>>>
>>>>
>>>>
>>>> On Sun, Feb 15, 2015 at 11:23 PM, Steph Meslin-Weber <
>>>> steph@tangency.co.uk> wrote:
>>>>
>>>>> Unfortunately it appears that when the NPE occurs,  dropping the
>>>>> participant no longer cleans up the related INSTANCE node. Perhaps some
>>>>> state is lost?
>>>>>
>>>>> Thanks,
>>>>> Steph
>>>>> On 16 Feb 2015 06:52, "Zhen Zhang" <nehzgnahz@gmail.com> wrote:
>>>>>
>>>>>> I think the NPE is not fatal. It happens when no message handler
>>>>>> factory is registered for this message type. The message will not
be
>>>>>> removed and remain in UNREAD state. Later when the message handler
factory
>>>>>> is registered via:
>>>>>> DefaultMessagingService#registerMessageHandlerFactory, we will send
a
>>>>>> NOP message, which will in turn trigger HelixTaskExecutor to process
all
>>>>>> UNREAD messages. We should definitely fix this by logging a warning
message
>>>>>> instead of throwing an NPE.
>>>>>>
>>>>>> Thanks,
>>>>>> Jason
>>>>>>
>>>>>>
>>>>>> On Sun, Feb 15, 2015 at 7:30 PM, kishore g <g.kishore@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Controller assuming the state transition occurred is even more
>>>>>>> dangerous.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Sun, Feb 15, 2015 at 7:18 PM, vlad.gm@gmail.com <
>>>>>>> vlad.gm@gmail.com> wrote:
>>>>>>>
>>>>>>>> In my experience it was fatal. The callback would jot be
called but
>>>>>>>> the
>>>>>>>> controller would somehow assume the state transition occurred.
>>>>>>>> On Feb 15, 2015 7:13 PM, "kishore g" <g.kishore@gmail.com>
wrote:
>>>>>>>>
>>>>>>>> > Thanks Vlad. That explains the problem. That also explains
how
>>>>>>>> adding
>>>>>>>> > sleep of 3seconds work.
>>>>>>>> >
>>>>>>>> > Jason, is this exception fatal?. Will the message be
processed
>>>>>>>> again after
>>>>>>>> > the handler is added.
>>>>>>>> >
>>>>>>>> > thanks,
>>>>>>>> > Kishore G
>>>>>>>> >
>>>>>>>> > On Sun, Feb 15, 2015 at 6:41 PM, vlad.gm@gmail.com <
>>>>>>>> vlad.gm@gmail.com>
>>>>>>>> > wrote:
>>>>>>>> >
>>>>>>>> >> https://issues.apache.org/jira/browse/HELIX-548
>>>>>>>> >> On Feb 15, 2015 6:38 PM, "kishore g" <g.kishore@gmail.com>
>>>>>>>> wrote:
>>>>>>>> >>
>>>>>>>> >> > Hi Vlad,
>>>>>>>> >> >
>>>>>>>> >> > Was there any jira associated with it?
>>>>>>>> >> >
>>>>>>>> >> > thanks.
>>>>>>>> >> > Kishore G
>>>>>>>> >> >
>>>>>>>> >> > On Sun, Feb 15, 2015 at 4:36 PM, vlad.gm@gmail.com
<
>>>>>>>> vlad.gm@gmail.com>
>>>>>>>> >> > wrote:
>>>>>>>> >> >
>>>>>>>> >> >> Looks like the same problem we encountered
recently.
>>>>>>>> >> >>
>>>>>>>> >> >> Regards,
>>>>>>>> >> >> Vlad
>>>>>>>> >> >> On Feb 15, 2015 4:35 PM, "kishore g" <g.kishore@gmail.com>
>>>>>>>> wrote:
>>>>>>>> >> >>
>>>>>>>> >> >> > Steph described this problem on IRC.
>>>>>>>> >> >> >
>>>>>>>> >> >> > He is using 0.7.1. On connecting to
cluster he gets this NPE
>>>>>>>> >> >> >
>>>>>>>> >> >> > http://pastebin.com/YE3fwK5i
>>>>>>>> >> >> >
>>>>>>>> >> >> > java.lang.NullPointerException
>>>>>>>> >> >> >         at
>>>>>>>> >> >> >
>>>>>>>> >> >>
>>>>>>>> >>
>>>>>>>> org.apache.helix.messaging.handling.HelixTaskExecutor.createMessageHandler(HelixTaskExecutor.java:661)
>>>>>>>> >> >> >         at
>>>>>>>> >> >> >
>>>>>>>> >> >>
>>>>>>>> >>
>>>>>>>> org.apache.helix.messaging.handling.HelixTaskExecutor.onMessage(HelixTaskExecutor.java:581)
>>>>>>>> >> >> >         at
>>>>>>>> >> >> >
>>>>>>>> >> >>
>>>>>>>> >>
>>>>>>>> org.apache.helix.manager.zk.ZkCallbackHandler.invoke(ZkCallbackHandler.java:202)
>>>>>>>> >> >> >         at
>>>>>>>> >> >> >
>>>>>>>> >> >>
>>>>>>>> >>
>>>>>>>> org.apache.helix.manager.zk.ZkCallbackHandler.init(ZkCallbackHandler.java:336)
>>>>>>>> >> >> >         at
>>>>>>>> >> >> >
>>>>>>>> >> >>
>>>>>>>> >>
>>>>>>>> org.apache.helix.manager.zk.ZkCallbackHandler.<init>(ZkCallbackHandler.java:130)
>>>>>>>> >> >> >         at
>>>>>>>> >> >> >
>>>>>>>> >> >>
>>>>>>>> >>
>>>>>>>> org.apache.helix.manager.zk.ZkHelixConnection.addListener(ZkHelixConnection.java:533)
>>>>>>>> >> >> >         at
>>>>>>>> >> >> >
>>>>>>>> >> >>
>>>>>>>> >>
>>>>>>>> org.apache.helix.manager.zk.ZkHelixConnection.addMessageListener(ZkHelixConnection.java:267)
>>>>>>>> >> >> >         at
>>>>>>>> >> >> >
>>>>>>>> >> >>
>>>>>>>> >>
>>>>>>>> org.apache.helix.manager.zk.ZkHelixParticipant.setupMsgHandler(ZkHelixParticipant.java:347)
>>>>>>>> >> >> >         at
>>>>>>>> >> >> >
>>>>>>>> >> >>
>>>>>>>> >>
>>>>>>>> org.apache.helix.manager.zk.ZkHelixParticipant.init(ZkHelixParticipant.java:383)
>>>>>>>> >> >> >         at
>>>>>>>> >> >> >
>>>>>>>> >> >>
>>>>>>>> >>
>>>>>>>> org.apache.helix.manager.zk.ZkHelixParticipant.onConnected(ZkHelixParticipant.java:401)
>>>>>>>> >> >> >         at
>>>>>>>> >> >> >
>>>>>>>> >> >>
>>>>>>>> >>
>>>>>>>> org.apache.helix.manager.zk.ZkHelixParticipant.start(ZkHelixParticipant.java:428)
>>>>>>>> >> >> >         at
>>>>>>>> >> >> >
>>>>>>>> >> >>
>>>>>>>> >>
>>>>>>>> com.example.ProtostuffServerNode.spinUpParticipant(ProtostuffServerNode.java:134)
>>>>>>>> >> >> >
>>>>>>>> >> >> >
>>>>>>>> >> >> > Here is his connection code.
>>>>>>>> >> >> >
>>>>>>>> >> >> > http://pastebin.com/QRfVU1tc
>>>>>>>> >> >> >
>>>>>>>> >> >> > private static HelixParticipant
>>>>>>>> spinUpParticipant(HelixAdmin admin,
>>>>>>>> >> >> > ParticipantId participantId) {
>>>>>>>> >> >> >                 LOGGER.info("Starting
up "+participantId);
>>>>>>>> >> >> >                 HelixConnection connection
= new
>>>>>>>> ZkHelixConnection(
>>>>>>>> >> >> > ZK_ADDRESS);
>>>>>>>> >> >> >                 connection.connect();
>>>>>>>> >> >> >                 HelixParticipant participant
= connection.
>>>>>>>> >> >> > createParticipant(CLUSTER_ID, participantId);
>>>>>>>> >> >> >                 StateMachineEngine
stateMach = participant.
>>>>>>>> >> >> > getStateMachineEngine();
>>>>>>>> >> >> >
>>>>>>>> >> >> >
>>>>>>>>  StateTransitionHandlerFactory<LocalTransitionHandler>
>>>>>>>> >> >> > transitionHandlerFactory = new
>>>>>>>> OnlineOfflineHandlerFactory();
>>>>>>>> >> >> >
>>>>>>>>  stateMach.registerStateModelFactory(STATE_MODEL_NAME,
>>>>>>>> >> >> > transitionHandlerFactory);
>>>>>>>> >> >> >                 participant.start();
>>>>>>>> >> >> >
>>>>>>>> >> >> >                 admin.enableInstance(CLUSTER_NAME,
>>>>>>>> >> >> participantId.toString(
>>>>>>>> >> >> > ), true);
>>>>>>>> >> >> >
>>>>>>>> >> >> >                 return participant;
>>>>>>>> >> >> >         }
>>>>>>>> >> >> >
>>>>>>>> >> >> > Adding 3s sleep after registerStateModelFactory
works. Any
>>>>>>>> idea what
>>>>>>>> >> is
>>>>>>>> >> >> > happening.
>>>>>>>> >> >> >
>>>>>>>> >> >> > thanks,
>>>>>>>> >> >> > Kishore G
>>>>>>>> >> >> >
>>>>>>>> >> >> >
>>>>>>>> >> >> >
>>>>>>>> >> >> >
>>>>>>>> >> >>
>>>>>>>> >> >
>>>>>>>> >> >
>>>>>>>> >>
>>>>>>>> >
>>>>>>>> >
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>
>

Mime
View raw message