Return-Path: X-Original-To: apmail-helix-user-archive@minotaur.apache.org Delivered-To: apmail-helix-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E606A1018E for ; Mon, 16 Feb 2015 07:28:47 +0000 (UTC) Received: (qmail 8110 invoked by uid 500); 16 Feb 2015 07:28:47 -0000 Delivered-To: apmail-helix-user-archive@helix.apache.org Received: (qmail 8046 invoked by uid 500); 16 Feb 2015 07:28:47 -0000 Mailing-List: contact user-help@helix.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@helix.apache.org Delivered-To: mailing list user@helix.apache.org Received: (qmail 8026 invoked by uid 99); 16 Feb 2015 07:28:47 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 16 Feb 2015 07:28:47 +0000 X-ASF-Spam-Status: No, hits=2.5 required=5.0 tests=FREEMAIL_REPLY,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of g.kishore@gmail.com designates 74.125.82.46 as permitted sender) Received: from [74.125.82.46] (HELO mail-wg0-f46.google.com) (74.125.82.46) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 16 Feb 2015 07:28:43 +0000 Received: by mail-wg0-f46.google.com with SMTP id a1so27671154wgh.5; Sun, 15 Feb 2015 23:27:37 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=+njFl2S/HHCQJ1w3W2fDgPTTNwSsguiYQnmv1IvCAM0=; b=FRHL6T8mFQizU9zyyTSD4PvW4lK3Hs+XmoRvKOiebRfOlD68TyYf5ZwN5XlpKZ/etM RXwv0RK9kYX2j/naep0vdyewJURsGaDMShVJMheWwtrc6IQR2qe2TTaUTpMbIRBckHe5 K/vhoQIHD7J3Lb+sgF0kNKhZwJgAwDU0DTNruphC6EBnlt+MCIBUjgF5rf7ZcrChD/KW CqYeVng5OrU5l60q9Y23CGQ8PnJJFEV9tIXXZiwu36HBjoKRwqAHDzI9jdJLyrnZkCgz 3E+Zw7k1i6f/4CtVpzeHDPeAo4357bRWYl+R3wpIgre7VZJS0uAOo4FnDUwAqZP+Tc9L t27w== MIME-Version: 1.0 X-Received: by 10.194.109.36 with SMTP id hp4mr46323584wjb.17.1424071656938; Sun, 15 Feb 2015 23:27:36 -0800 (PST) Received: by 10.194.17.8 with HTTP; Sun, 15 Feb 2015 23:27:36 -0800 (PST) In-Reply-To: References: Date: Sun, 15 Feb 2015 23:27:36 -0800 Message-ID: Subject: Re: NPE during start up From: kishore g To: "user@helix.apache.org" Cc: "dev@helix.apache.org" Content-Type: multipart/alternative; boundary=047d7bf10a743f2209050f2f8543 X-Virus-Checked: Checked by ClamAV on apache.org --047d7bf10a743f2209050f2f8543 Content-Type: text/plain; charset=UTF-8 Hi Steph, When the NPE occurs, do you get the state transition callbacks? thanks, Kishore G On Sun, Feb 15, 2015 at 11:23 PM, Steph Meslin-Weber wrote: > Unfortunately it appears that when the NPE occurs, dropping the > participant no longer cleans up the related INSTANCE node. Perhaps some > state is lost? > > Thanks, > Steph > On 16 Feb 2015 06:52, "Zhen Zhang" wrote: > >> I think the NPE is not fatal. It happens when no message handler factory >> is registered for this message type. The message will not be removed and >> remain in UNREAD state. Later when the message handler factory is >> registered via: >> DefaultMessagingService#registerMessageHandlerFactory, we will send a NOP >> message, which will in turn trigger HelixTaskExecutor to process all UNREAD >> messages. We should definitely fix this by logging a warning message >> instead of throwing an NPE. >> >> Thanks, >> Jason >> >> >> On Sun, Feb 15, 2015 at 7:30 PM, kishore g wrote: >> >>> Controller assuming the state transition occurred is even more dangerous. >>> >>> >>> >>> >>> >>> On Sun, Feb 15, 2015 at 7:18 PM, vlad.gm@gmail.com >>> wrote: >>> >>>> In my experience it was fatal. The callback would jot be called but the >>>> controller would somehow assume the state transition occurred. >>>> On Feb 15, 2015 7:13 PM, "kishore g" wrote: >>>> >>>> > Thanks Vlad. That explains the problem. That also explains how adding >>>> > sleep of 3seconds work. >>>> > >>>> > Jason, is this exception fatal?. Will the message be processed again >>>> after >>>> > the handler is added. >>>> > >>>> > thanks, >>>> > Kishore G >>>> > >>>> > On Sun, Feb 15, 2015 at 6:41 PM, vlad.gm@gmail.com >>> > >>>> > wrote: >>>> > >>>> >> https://issues.apache.org/jira/browse/HELIX-548 >>>> >> On Feb 15, 2015 6:38 PM, "kishore g" wrote: >>>> >> >>>> >> > Hi Vlad, >>>> >> > >>>> >> > Was there any jira associated with it? >>>> >> > >>>> >> > thanks. >>>> >> > Kishore G >>>> >> > >>>> >> > On Sun, Feb 15, 2015 at 4:36 PM, vlad.gm@gmail.com < >>>> vlad.gm@gmail.com> >>>> >> > wrote: >>>> >> > >>>> >> >> Looks like the same problem we encountered recently. >>>> >> >> >>>> >> >> Regards, >>>> >> >> Vlad >>>> >> >> On Feb 15, 2015 4:35 PM, "kishore g" wrote: >>>> >> >> >>>> >> >> > Steph described this problem on IRC. >>>> >> >> > >>>> >> >> > He is using 0.7.1. On connecting to cluster he gets this NPE >>>> >> >> > >>>> >> >> > http://pastebin.com/YE3fwK5i >>>> >> >> > >>>> >> >> > java.lang.NullPointerException >>>> >> >> > at >>>> >> >> > >>>> >> >> >>>> >> >>>> org.apache.helix.messaging.handling.HelixTaskExecutor.createMessageHandler(HelixTaskExecutor.java:661) >>>> >> >> > at >>>> >> >> > >>>> >> >> >>>> >> >>>> org.apache.helix.messaging.handling.HelixTaskExecutor.onMessage(HelixTaskExecutor.java:581) >>>> >> >> > at >>>> >> >> > >>>> >> >> >>>> >> >>>> org.apache.helix.manager.zk.ZkCallbackHandler.invoke(ZkCallbackHandler.java:202) >>>> >> >> > at >>>> >> >> > >>>> >> >> >>>> >> >>>> org.apache.helix.manager.zk.ZkCallbackHandler.init(ZkCallbackHandler.java:336) >>>> >> >> > at >>>> >> >> > >>>> >> >> >>>> >> >>>> org.apache.helix.manager.zk.ZkCallbackHandler.(ZkCallbackHandler.java:130) >>>> >> >> > at >>>> >> >> > >>>> >> >> >>>> >> >>>> org.apache.helix.manager.zk.ZkHelixConnection.addListener(ZkHelixConnection.java:533) >>>> >> >> > at >>>> >> >> > >>>> >> >> >>>> >> >>>> org.apache.helix.manager.zk.ZkHelixConnection.addMessageListener(ZkHelixConnection.java:267) >>>> >> >> > at >>>> >> >> > >>>> >> >> >>>> >> >>>> org.apache.helix.manager.zk.ZkHelixParticipant.setupMsgHandler(ZkHelixParticipant.java:347) >>>> >> >> > at >>>> >> >> > >>>> >> >> >>>> >> >>>> org.apache.helix.manager.zk.ZkHelixParticipant.init(ZkHelixParticipant.java:383) >>>> >> >> > at >>>> >> >> > >>>> >> >> >>>> >> >>>> org.apache.helix.manager.zk.ZkHelixParticipant.onConnected(ZkHelixParticipant.java:401) >>>> >> >> > at >>>> >> >> > >>>> >> >> >>>> >> >>>> org.apache.helix.manager.zk.ZkHelixParticipant.start(ZkHelixParticipant.java:428) >>>> >> >> > at >>>> >> >> > >>>> >> >> >>>> >> >>>> com.example.ProtostuffServerNode.spinUpParticipant(ProtostuffServerNode.java:134) >>>> >> >> > >>>> >> >> > >>>> >> >> > Here is his connection code. >>>> >> >> > >>>> >> >> > http://pastebin.com/QRfVU1tc >>>> >> >> > >>>> >> >> > private static HelixParticipant spinUpParticipant(HelixAdmin >>>> admin, >>>> >> >> > ParticipantId participantId) { >>>> >> >> > LOGGER.info("Starting up "+participantId); >>>> >> >> > HelixConnection connection = new >>>> ZkHelixConnection( >>>> >> >> > ZK_ADDRESS); >>>> >> >> > connection.connect(); >>>> >> >> > HelixParticipant participant = connection. >>>> >> >> > createParticipant(CLUSTER_ID, participantId); >>>> >> >> > StateMachineEngine stateMach = participant. >>>> >> >> > getStateMachineEngine(); >>>> >> >> > >>>> >> >> > >>>> StateTransitionHandlerFactory >>>> >> >> > transitionHandlerFactory = new OnlineOfflineHandlerFactory(); >>>> >> >> > >>>> stateMach.registerStateModelFactory(STATE_MODEL_NAME, >>>> >> >> > transitionHandlerFactory); >>>> >> >> > participant.start(); >>>> >> >> > >>>> >> >> > admin.enableInstance(CLUSTER_NAME, >>>> >> >> participantId.toString( >>>> >> >> > ), true); >>>> >> >> > >>>> >> >> > return participant; >>>> >> >> > } >>>> >> >> > >>>> >> >> > Adding 3s sleep after registerStateModelFactory works. Any idea >>>> what >>>> >> is >>>> >> >> > happening. >>>> >> >> > >>>> >> >> > thanks, >>>> >> >> > Kishore G >>>> >> >> > >>>> >> >> > >>>> >> >> > >>>> >> >> > >>>> >> >> >>>> >> > >>>> >> > >>>> >> >>>> > >>>> > >>>> >>> >>> >> --047d7bf10a743f2209050f2f8543 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Hi Steph,

When the NPE occurs, do you g= et the state transition callbacks?

thanks,
Kishore G



On Sun, Feb 15, 2015 at 11:23 PM, Step= h Meslin-Weber <steph@tangency.co.uk> wrote:

Unfortunately it appears that when the= NPE occurs,=C2=A0 dropping the participant no longer cleans up the related= INSTANCE node. Perhaps some state is lost?

Thanks,
Steph

On 16 Feb 2015 06:52, "Zhen Zhang" <= ;nehzgnahz@gmail.c= om> wrote:
I think the NPE is not fatal. It happens when no message han= dler factory is registered for this message type. The message will not be r= emoved and remain in UNREAD state. Later when the message handler factory i= s registered via:
DefaultMessagingService#registerMessageHandlerFactory= , we will send a NOP message, which will in turn trigger HelixTaskExecutor = to process all UNREAD messages. We should definitely fix this by logging a = warning message instead of throwing an NPE.

Thanks= ,
Jason


=
On Sun, Feb 15, 2015 at 7:30 PM, kishore g <g= .kishore@gmail.com> wrote:
=
Controller assuming the state transition occurred is = even more dangerous.





On Sun, Feb 15, 2015 at 7:18 PM, vlad.gm@gmail.com <vlad.gm@gmail.com> wrote:
In my experience it was fata= l. The callback would jot be called but the
controller would somehow assume the state transition occurred.
On Feb 15, 2015 7:13 PM, "kishore g" <g.kishore@gmail.com> wrot= e:

> Thanks Vlad. That explains the problem. That also explains how adding<= br> > sleep of 3seconds work.
>
> Jason, is this exception fatal?. Will the message be processed again a= fter
> the handler is added.
>
> thanks,
> Kishore G
>
> On Sun, Feb 15, 2015 at 6:41 PM, vlad.gm@gmail.com <vlad.gm@gmail.com>
> wrote:
>
>> https://issues.apache.org/jira/browse/HELIX-548
>> On Feb 15, 2015 6:38 PM, "kishore g" <g.kishore@gmail.com> wrote= :
>>
>> > Hi Vlad,
>> >
>> > Was there any jira associated with it?
>> >
>> > thanks.
>> > Kishore G
>> >
>> > On Sun, Feb 15, 2015 at 4:36 PM, vlad.gm@gmail.com <vlad.gm@gmail.com>
>> > wrote:
>> >
>> >> Looks like the same problem we encountered recently.
>> >>
>> >> Regards,
>> >> Vlad
>> >> On Feb 15, 2015 4:35 PM, "kishore g" <g.kishore@gmail.com&= gt; wrote:
>> >>
>> >> > Steph described this problem on IRC.
>> >> >
>> >> > He is using 0.7.1. On connecting to cluster he gets = this NPE
>> >> >
>> >> > http://pastebin.com/YE3fwK5i
>> >> >
>> >> > java.lang.NullPointerException
>> >> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0at
>> >> >
>> >>
>> org.apache.helix.messaging.handling.HelixTaskExecutor.createMessag= eHandler(HelixTaskExecutor.java:661)
>> >> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0at
>> >> >
>> >>
>> org.apache.helix.messaging.handling.HelixTaskExecutor.onMessage(He= lixTaskExecutor.java:581)
>> >> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0at
>> >> >
>> >>
>> org.apache.helix.manager.zk.ZkCallbackHandler.invoke(ZkCallbackHan= dler.java:202)
>> >> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0at
>> >> >
>> >>
>> org.apache.helix.manager.zk.ZkCallbackHandler.init(ZkCallbackHandl= er.java:336)
>> >> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0at
>> >> >
>> >>
>> org.apache.helix.manager.zk.ZkCallbackHandler.<init>(ZkCallb= ackHandler.java:130)
>> >> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0at
>> >> >
>> >>
>> org.apache.helix.manager.zk.ZkHelixConnection.addListener(ZkHelixC= onnection.java:533)
>> >> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0at
>> >> >
>> >>
>> org.apache.helix.manager.zk.ZkHelixConnection.addMessageListener(Z= kHelixConnection.java:267)
>> >> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0at
>> >> >
>> >>
>> org.apache.helix.manager.zk.ZkHelixParticipant.setupMsgHandler(ZkH= elixParticipant.java:347)
>> >> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0at
>> >> >
>> >>
>> org.apache.helix.manager.zk.ZkHelixParticipant.init(ZkHelixPartici= pant.java:383)
>> >> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0at
>> >> >
>> >>
>> org.apache.helix.manager.zk.ZkHelixParticipant.onConnected(ZkHelix= Participant.java:401)
>> >> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0at
>> >> >
>> >>
>> org.apache.helix.manager.zk.ZkHelixParticipant.start(ZkHelixPartic= ipant.java:428)
>> >> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0at
>> >> >
>> >>
>> com.example.ProtostuffServerNode.spinUpParticipant(ProtostuffServe= rNode.java:134)
>> >> >
>> >> >
>> >> > Here is his connection code.
>> >> >
>> >> > http://pastebin.com/QRfVU1tc
>> >> >
>> >> > private static HelixParticipant spinUpParticipant(He= lixAdmin admin,
>> >> > ParticipantId participantId) {
>> >> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0LOGGER.info("Starting up "+participantId);
>> >> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0HelixConnection connection =3D new ZkHelixConnection(
>> >> > ZK_ADDRESS);
>> >> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0connection.connect();
>> >> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0HelixParticipant participant =3D connection.
>> >> > createParticipant(CLUSTER_ID, participantId);
>> >> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0StateMachineEngine stateMach =3D participant.
>> >> > getStateMachineEngine();
>> >> >
>> >> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0StateTransitionHandlerFactory<LocalTransitionHandler>
>> >> > transitionHandlerFactory =3D new OnlineOfflineHandle= rFactory();
>> >> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0stateMach.registerStateModelFactory(STATE_MODEL_NAME,
>> >> > transitionHandlerFactory);
>> >> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0participant.start();
>> >> >
>> >> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0admin.enableInstance(CLUSTER_NAME,
>> >> participantId.toString(
>> >> > ), true);
>> >> >
>> >> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0return participant;
>> >> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0}
>> >> >
>> >> > Adding 3s sleep after registerStateModelFactory work= s. Any idea what
>> is
>> >> > happening.
>> >> >
>> >> > thanks,
>> >> > Kishore G
>> >> >
>> >> >
>> >> >
>> >> >
>> >>
>> >
>> >
>>
>
>



--047d7bf10a743f2209050f2f8543--