Return-Path: X-Original-To: apmail-helix-user-archive@minotaur.apache.org Delivered-To: apmail-helix-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id C528D100D7 for ; Mon, 16 Feb 2015 06:52:41 +0000 (UTC) Received: (qmail 69347 invoked by uid 500); 16 Feb 2015 06:52:32 -0000 Delivered-To: apmail-helix-user-archive@helix.apache.org Received: (qmail 69301 invoked by uid 500); 16 Feb 2015 06:52:32 -0000 Mailing-List: contact user-help@helix.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@helix.apache.org Delivered-To: mailing list user@helix.apache.org Received: (qmail 69281 invoked by uid 99); 16 Feb 2015 06:52:32 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 16 Feb 2015 06:52:32 +0000 X-ASF-Spam-Status: No, hits=2.5 required=5.0 tests=FREEMAIL_REPLY,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of nehzgnahz@gmail.com designates 209.85.223.182 as permitted sender) Received: from [209.85.223.182] (HELO mail-ie0-f182.google.com) (209.85.223.182) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 16 Feb 2015 06:52:06 +0000 Received: by iebtr6 with SMTP id tr6so21411766ieb.7; Sun, 15 Feb 2015 22:49:49 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=Yvgn8xLwlDEicQROme1Lvd7n9F/UUku0mIt7Fa53hR8=; b=bzhCtMTKAQEHYVjiG1CKGuiuEo+aHhh3K87zgOFD4H75d1W1CSVgAquJktAtc4eFU3 4ZeHgVnpWqxnTpK3o6R9ZfukhBRQZgeMJx2eiw9AabCFTFrcZ+AaqdxRUl/I1AZ7GzQr n6PSyUkpNQpj1veeNv5ku2a76mWpx+THSEkvYnYzLJCgZtK2/PydbX3xAkOh0y6ncPOU +N3PQhbb1i0kxp8Bh1xGVt3Hy6aQEFfANTQkLVLREubkqjGdnaKnrRwcg4e4ObUOrc/s jH9B28m0f1ar3VrPWp8+No9DWD13zvq/Y83XhPCJDwwoL5TBOH2BDRhXw40Hf/sh5rzL 8ssQ== MIME-Version: 1.0 X-Received: by 10.107.12.167 with SMTP id 39mr6292856iom.71.1424069389540; Sun, 15 Feb 2015 22:49:49 -0800 (PST) Received: by 10.43.157.72 with HTTP; Sun, 15 Feb 2015 22:49:49 -0800 (PST) In-Reply-To: References: Date: Sun, 15 Feb 2015 22:49:49 -0800 Message-ID: Subject: Re: NPE during start up From: Zhen Zhang To: "user@helix.apache.org" Cc: "dev@helix.apache.org" Content-Type: multipart/alternative; boundary=001a113fb86619614c050f2efe2b X-Virus-Checked: Checked by ClamAV on apache.org --001a113fb86619614c050f2efe2b Content-Type: text/plain; charset=UTF-8 I think the NPE is not fatal. It happens when no message handler factory is registered for this message type. The message will not be removed and remain in UNREAD state. Later when the message handler factory is registered via: DefaultMessagingService#registerMessageHandlerFactory, we will send a NOP message, which will in turn trigger HelixTaskExecutor to process all UNREAD messages. We should definitely fix this by logging a warning message instead of throwing an NPE. Thanks, Jason On Sun, Feb 15, 2015 at 7:30 PM, kishore g wrote: > Controller assuming the state transition occurred is even more dangerous. > > > > > > On Sun, Feb 15, 2015 at 7:18 PM, vlad.gm@gmail.com > wrote: > >> In my experience it was fatal. The callback would jot be called but the >> controller would somehow assume the state transition occurred. >> On Feb 15, 2015 7:13 PM, "kishore g" wrote: >> >> > Thanks Vlad. That explains the problem. That also explains how adding >> > sleep of 3seconds work. >> > >> > Jason, is this exception fatal?. Will the message be processed again >> after >> > the handler is added. >> > >> > thanks, >> > Kishore G >> > >> > On Sun, Feb 15, 2015 at 6:41 PM, vlad.gm@gmail.com >> > wrote: >> > >> >> https://issues.apache.org/jira/browse/HELIX-548 >> >> On Feb 15, 2015 6:38 PM, "kishore g" wrote: >> >> >> >> > Hi Vlad, >> >> > >> >> > Was there any jira associated with it? >> >> > >> >> > thanks. >> >> > Kishore G >> >> > >> >> > On Sun, Feb 15, 2015 at 4:36 PM, vlad.gm@gmail.com < >> vlad.gm@gmail.com> >> >> > wrote: >> >> > >> >> >> Looks like the same problem we encountered recently. >> >> >> >> >> >> Regards, >> >> >> Vlad >> >> >> On Feb 15, 2015 4:35 PM, "kishore g" wrote: >> >> >> >> >> >> > Steph described this problem on IRC. >> >> >> > >> >> >> > He is using 0.7.1. On connecting to cluster he gets this NPE >> >> >> > >> >> >> > http://pastebin.com/YE3fwK5i >> >> >> > >> >> >> > java.lang.NullPointerException >> >> >> > at >> >> >> > >> >> >> >> >> >> org.apache.helix.messaging.handling.HelixTaskExecutor.createMessageHandler(HelixTaskExecutor.java:661) >> >> >> > at >> >> >> > >> >> >> >> >> >> org.apache.helix.messaging.handling.HelixTaskExecutor.onMessage(HelixTaskExecutor.java:581) >> >> >> > at >> >> >> > >> >> >> >> >> >> org.apache.helix.manager.zk.ZkCallbackHandler.invoke(ZkCallbackHandler.java:202) >> >> >> > at >> >> >> > >> >> >> >> >> >> org.apache.helix.manager.zk.ZkCallbackHandler.init(ZkCallbackHandler.java:336) >> >> >> > at >> >> >> > >> >> >> >> >> >> org.apache.helix.manager.zk.ZkCallbackHandler.(ZkCallbackHandler.java:130) >> >> >> > at >> >> >> > >> >> >> >> >> >> org.apache.helix.manager.zk.ZkHelixConnection.addListener(ZkHelixConnection.java:533) >> >> >> > at >> >> >> > >> >> >> >> >> >> org.apache.helix.manager.zk.ZkHelixConnection.addMessageListener(ZkHelixConnection.java:267) >> >> >> > at >> >> >> > >> >> >> >> >> >> org.apache.helix.manager.zk.ZkHelixParticipant.setupMsgHandler(ZkHelixParticipant.java:347) >> >> >> > at >> >> >> > >> >> >> >> >> >> org.apache.helix.manager.zk.ZkHelixParticipant.init(ZkHelixParticipant.java:383) >> >> >> > at >> >> >> > >> >> >> >> >> >> org.apache.helix.manager.zk.ZkHelixParticipant.onConnected(ZkHelixParticipant.java:401) >> >> >> > at >> >> >> > >> >> >> >> >> >> org.apache.helix.manager.zk.ZkHelixParticipant.start(ZkHelixParticipant.java:428) >> >> >> > at >> >> >> > >> >> >> >> >> >> com.example.ProtostuffServerNode.spinUpParticipant(ProtostuffServerNode.java:134) >> >> >> > >> >> >> > >> >> >> > Here is his connection code. >> >> >> > >> >> >> > http://pastebin.com/QRfVU1tc >> >> >> > >> >> >> > private static HelixParticipant spinUpParticipant(HelixAdmin >> admin, >> >> >> > ParticipantId participantId) { >> >> >> > LOGGER.info("Starting up "+participantId); >> >> >> > HelixConnection connection = new >> ZkHelixConnection( >> >> >> > ZK_ADDRESS); >> >> >> > connection.connect(); >> >> >> > HelixParticipant participant = connection. >> >> >> > createParticipant(CLUSTER_ID, participantId); >> >> >> > StateMachineEngine stateMach = participant. >> >> >> > getStateMachineEngine(); >> >> >> > >> >> >> > >> StateTransitionHandlerFactory >> >> >> > transitionHandlerFactory = new OnlineOfflineHandlerFactory(); >> >> >> > >> stateMach.registerStateModelFactory(STATE_MODEL_NAME, >> >> >> > transitionHandlerFactory); >> >> >> > participant.start(); >> >> >> > >> >> >> > admin.enableInstance(CLUSTER_NAME, >> >> >> participantId.toString( >> >> >> > ), true); >> >> >> > >> >> >> > return participant; >> >> >> > } >> >> >> > >> >> >> > Adding 3s sleep after registerStateModelFactory works. Any idea >> what >> >> is >> >> >> > happening. >> >> >> > >> >> >> > thanks, >> >> >> > Kishore G >> >> >> > >> >> >> > >> >> >> > >> >> >> > >> >> >> >> >> > >> >> > >> >> >> > >> > >> > > --001a113fb86619614c050f2efe2b Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
I think the NPE is not fatal. It happens when no message h= andler factory is registered for this message type. The message will not be= removed and remain in UNREAD state. Later when the message handler factory= is registered via:
DefaultMessagingService#registerMessageHandlerFacto= ry, we will send a NOP message, which will in turn trigger HelixTaskExecuto= r to process all UNREAD messages. We should definitely fix this by logging = a warning message instead of throwing an NPE.

Than= ks,
Jason

On Sun, Feb 15, 2015 at 7:30 PM, kishore g <g.kishore@gmail.com> wrote:
Controller assuming the state transition occurred i= s even more dangerous.





On Sun, Feb 15, 2015 at 7:18 PM, = vlad.gm@gmail.com <vlad.gm@gmail.com> wrote:
In my experience it was fatal. The callback would jot be called but the<= br> controller would somehow assume the state transition occurred.
On Feb 15, 2015 7:13 PM, "kishore g" <g.kishore@gmail.com> wrot= e:

> Thanks Vlad. That explains the problem. That also explains how adding<= br> > sleep of 3seconds work.
>
> Jason, is this exception fatal?. Will the message be processed again a= fter
> the handler is added.
>
> thanks,
> Kishore G
>
> On Sun, Feb 15, 2015 at 6:41 PM, vlad.gm@gmail.com <vlad.gm@gmail.com>
> wrote:
>
>> https://issues.apache.org/jira/browse/HELIX-548
>> On Feb 15, 2015 6:38 PM, "kishore g" <g.kishore@gmail.com> wrote= :
>>
>> > Hi Vlad,
>> >
>> > Was there any jira associated with it?
>> >
>> > thanks.
>> > Kishore G
>> >
>> > On Sun, Feb 15, 2015 at 4:36 PM, vlad.gm@gmail.com <vlad.gm@gmail.com>
>> > wrote:
>> >
>> >> Looks like the same problem we encountered recently.
>> >>
>> >> Regards,
>> >> Vlad
>> >> On Feb 15, 2015 4:35 PM, "kishore g" <g.kishore@gmail.com&= gt; wrote:
>> >>
>> >> > Steph described this problem on IRC.
>> >> >
>> >> > He is using 0.7.1. On connecting to cluster he gets = this NPE
>> >> >
>> >> > http://pastebin.com/YE3fwK5i
>> >> >
>> >> > java.lang.NullPointerException
>> >> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0at
>> >> >
>> >>
>> org.apache.helix.messaging.handling.HelixTaskExecutor.createMessag= eHandler(HelixTaskExecutor.java:661)
>> >> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0at
>> >> >
>> >>
>> org.apache.helix.messaging.handling.HelixTaskExecutor.onMessage(He= lixTaskExecutor.java:581)
>> >> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0at
>> >> >
>> >>
>> org.apache.helix.manager.zk.ZkCallbackHandler.invoke(ZkCallbackHan= dler.java:202)
>> >> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0at
>> >> >
>> >>
>> org.apache.helix.manager.zk.ZkCallbackHandler.init(ZkCallbackHandl= er.java:336)
>> >> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0at
>> >> >
>> >>
>> org.apache.helix.manager.zk.ZkCallbackHandler.<init>(ZkCallb= ackHandler.java:130)
>> >> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0at
>> >> >
>> >>
>> org.apache.helix.manager.zk.ZkHelixConnection.addListener(ZkHelixC= onnection.java:533)
>> >> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0at
>> >> >
>> >>
>> org.apache.helix.manager.zk.ZkHelixConnection.addMessageListener(Z= kHelixConnection.java:267)
>> >> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0at
>> >> >
>> >>
>> org.apache.helix.manager.zk.ZkHelixParticipant.setupMsgHandler(ZkH= elixParticipant.java:347)
>> >> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0at
>> >> >
>> >>
>> org.apache.helix.manager.zk.ZkHelixParticipant.init(ZkHelixPartici= pant.java:383)
>> >> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0at
>> >> >
>> >>
>> org.apache.helix.manager.zk.ZkHelixParticipant.onConnected(ZkHelix= Participant.java:401)
>> >> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0at
>> >> >
>> >>
>> org.apache.helix.manager.zk.ZkHelixParticipant.start(ZkHelixPartic= ipant.java:428)
>> >> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0at
>> >> >
>> >>
>> com.example.ProtostuffServerNode.spinUpParticipant(ProtostuffServe= rNode.java:134)
>> >> >
>> >> >
>> >> > Here is his connection code.
>> >> >
>> >> > http://pastebin.com/QRfVU1tc
>> >> >
>> >> > private static HelixParticipant spinUpParticipant(He= lixAdmin admin,
>> >> > ParticipantId participantId) {
>> >> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0LOGGER.info("Starting up "+participantId);
>> >> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0HelixConnection connection =3D new ZkHelixConnection(
>> >> > ZK_ADDRESS);
>> >> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0connection.connect();
>> >> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0HelixParticipant participant =3D connection.
>> >> > createParticipant(CLUSTER_ID, participantId);
>> >> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0StateMachineEngine stateMach =3D participant.
>> >> > getStateMachineEngine();
>> >> >
>> >> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0StateTransitionHandlerFactory<LocalTransitionHandler>
>> >> > transitionHandlerFactory =3D new OnlineOfflineHandle= rFactory();
>> >> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0stateMach.registerStateModelFactory(STATE_MODEL_NAME,
>> >> > transitionHandlerFactory);
>> >> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0participant.start();
>> >> >
>> >> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0admin.enableInstance(CLUSTER_NAME,
>> >> participantId.toString(
>> >> > ), true);
>> >> >
>> >> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0return participant;
>> >> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0}
>> >> >
>> >> > Adding 3s sleep after registerStateModelFactory work= s. Any idea what
>> is
>> >> > happening.
>> >> >
>> >> > thanks,
>> >> > Kishore G
>> >> >
>> >> >
>> >> >
>> >> >
>> >>
>> >
>> >
>>
>
>


--001a113fb86619614c050f2efe2b--