geode-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Roi Apelker <Roi.Apel...@amdocs.com>
Subject RE: continuous query internal mechanism questions
Date Wed, 16 Aug 2017 16:45:02 GMT
Thanks,
The subscription-redundancy is set to "1" and the region is used in 2 nodes (there are more
nodes which re not related to it).

Yes there is an exception, which I am yet to understand: (and this exception causes the closure
of the CQ in this node as well as sending operation message to the other node to close!)

caught exception while running: 
java.io.IOException: Broken pipe
	at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
	at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
	at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
	at sun.nio.ch.IOUtil.write(IOUtil.java:51)
	at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:471)
	at org.apache.geode.internal.cache.tier.sockets.Message.flushBuffer(Message.java:651)
	at org.apache.geode.internal.cache.tier.sockets.Message.sendBytes(Message.java:632)
	at org.apache.geode.internal.cache.tier.sockets.ChunkedMessage.sendChunk(ChunkedMessage.java:314)
	at org.apache.geode.internal.cache.tier.sockets.ChunkedMessage.sendChunk(ChunkedMessage.java:322)
	at org.apache.geode.internal.cache.tier.sockets.BaseCommand.writeQueryResponseChunk(BaseCommand.java:756)
	at org.apache.geode.internal.cache.tier.sockets.BaseCommandQuery.processQueryUsingParams(BaseCommandQuery.java:225)
	at org.apache.geode.internal.cache.tier.sockets.BaseCommandQuery.processQuery(BaseCommandQuery.java:70)
	at org.apache.geode.internal.cache.tier.sockets.command.ExecuteCQ61.cmdExecute(ExecuteCQ61.java:179)
	at org.apache.geode.internal.cache.tier.sockets.BaseCommand.execute(BaseCommand.java:147)
	at org.apache.geode.internal.cache.tier.sockets.ServerConnection.doNormalMsg(ServerConnection.java:783)
	at org.apache.geode.internal.cache.tier.sockets.ServerConnection.doOneMessage(ServerConnection.java:913)
	at org.apache.geode.internal.cache.tier.sockets.ServerConnection.run(ServerConnection.java:1143)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at org.apache.geode.internal.cache.tier.sockets.AcceptorImpl$1$1.run(AcceptorImpl.java:546)
	at java.lang.Thread.run(Thread.java:745)

could it be the client disconnected from the node right after sending this message? (The client
itself continues to run normally...)
The scenario is that after all nodes are initialized, I am stopping one server out of 2. Sometimes,
1 out of 5 - the CQ stops notifying the client after this stop. Most if the time the CQ continues
to run fine.
I am certain this is related to some timing issue, some registration which fails, something
also related to the filter profiles which is held in the region...

Thanks

Roi

-----Original Message-----
From: Anilkumar Gingade [mailto:agingade@pivotal.io] 
Sent: Wednesday, August 16, 2017 1:41 AM
To: dev@geode.apache.org
Subject: Re: continuous query internal mechanism questions

In Geode, high availability for subscription events are achieved by having redundant event-queues
(HAQueues) on multiple severs; this is configured using redundancy-level with client connection.
Based on the redundancy level, the client register CQs on multiple servers. During the subscription
(CQ) registration, it elects/assigns one of the server to host primary HAQueue.

The client keeps monitoring the redundancy level during node join or failure; to satisfy the
redundancy level.

You can find more about HAQueues at
https://cwiki.apache.org/confluence/display/GEODE/HA+Client+Event+Queues

I assume, you have 2 node cluster. What is your subscription redundancy level?

>> For some reason, sometimes there is a failure to complete the first
registration
Is there any log message, stack trace, reporting reason for failure? If its dev environment,
you can run client/server with debug/fine level log to see additional info.

Are you trying to stop your server, while registering the CQs? Can you give more detail about
your test scenario...

-Anil.


On Tue, Aug 15, 2017 at 11:25 AM, Jason Huynh <jasonhuynh@apache.org> wrote:

> I am not quite sure how native client registers cqs. From my understanding:
> with the java api, I believe there is only one message (ExecuteCQ 
> message) that is executed on the server side and then replicated to 
> the other nodes through the profile (OperationMessage).
>
> It seems the extra ExecuteCQ message failing and then closing the cq 
> might be putting the system in a weird state...
>
> On Tue, Aug 15, 2017 at 7:56 AM Roi Apelker <Roi.Apelker@amdocs.com>
> wrote:
>
> > Hi,
> >
> > I have been examining the continuous query registration mechanism 
> > for quite some time This is related to an issue that I have, where 
> > sometimes a node crashes
> (1
> > node out of 2), and the other one does not send CQ events. The CQ is 
> > registered on a partitioned region which resides on these 2 nodes.
> >
> > I noticed the following behavior, and I wonder if anyone can comment 
> > regarding it, if it is justified or not and what is the reason:
> >
> > 1. When the software using the client (native client) registers for 
> > the CQ, a CQ command (ExecuteCQ61) is received on both servers.
> >  -- is this normal behaviour? Does the client actually send this 
> > command to both servers?
> >
> > 2. When this command is received by a server, and the CQ is 
> > registered, another registration message is sent to the other node 
> > via an OperationMessage (REGISTER_CQ)
> >  -- it seems that regularly, the server can handle this situation as 
> > the second registration identifies the previous one and does not affect it.
> but
> > the question, why do we need this 2nd registration, if there is a 
> > command sent to each server?
> >
> > 3. For some reason, sometimes there is a failure to complete the 
> > first registration (executed by ExecuteCQ61) and then this failure 
> > causes a closure to the CQ, which is accompanied with a close 
> > request to the other node.
> >  -- I assume by now, since 2 registrations and one closure have 
> > occurred on node 2, the CQ is still active and the client receives notifications.
> >
> > 4. Sometimes, 1 out of 5, once node 1 crashes, I get a cleanup 
> > operation, caused by the crash (via MemberCrashedEvent), and this 
> > also closes the existing CQ, and in this case the CQ in node 2 does 
> > not operate anymore
> and
> > the client receives no notifications.
> >  -- fact is, that 4 out of 4 times, I do not get this cleanup by 
> > MemberCrashedEvent (maybe due to some other error), and that the CQ 
> > notifications are received normally.
> >
> > Can anyone clear things up for me? Any comment on any of the 
> > statements above will be greatly appreciated.
> >
> > Thanks,
> >
> > Roi
> >
> >
> > -----Original Message-----
> > From: Roi Apelker
> > Sent: Wednesday, August 09, 2017 3:21 PM
> > To: dev@geode.apache.org
> > Subject: RE: continuous query internal mechanism
> >
> > Dhanyavad
> >
> > -----Original Message-----
> > From: Anilkumar Gingade [mailto:agingade@pivotal.io]
> > Sent: Tuesday, August 08, 2017 9:55 PM
> > To: dev@geode.apache.org
> > Subject: Re: continuous query internal mechanism
> >
> > Registered events, i meant, are events generated for interest
> registration
> > "region.registerInterest(*)". And CqEvents are for CQs registered.
> >
> > -Anil.
> >
> >
> > On Tue, Aug 8, 2017 at 12:27 AM, Roi Apelker 
> > <Roi.Apelker@amdocs.com>
> > wrote:
> >
> > > Shukriya
> > >
> > > What is the difference between registered events and CQ events?
> > >
> > > -----Original Message-----
> > > From: Anilkumar Gingade [mailto:agingade@pivotal.io]
> > > Sent: Monday, August 07, 2017 10:12 PM
> > > To: dev@geode.apache.org
> > > Subject: Re: continuous query internal mechanism
> > >
> > > CQ Processing on server side is same for all clients (Java, C++)...
> > >
> > > The subscription events are sent to client as ClientUpdateMessage, 
> > > which holds information about registered events and CQ events. The 
> > > client process this and updates/invokes the client side 
> > > cache/listeners with respective event. Look into 
> > > ClientUpdateMessageImpl and CacheClientUpdater (for client side
> > processing).
> > >
> > > -Anil.
> > >
> > >
> > >
> > >
> > > On Mon, Aug 7, 2017 at 11:01 AM, Roi Apelker 
> > > <Roi.Apelker@amdocs.com>
> > > wrote:
> > >
> > > > Thanks,
> > > >
> > > > By the way, is there any difference in the behaviour of the 
> > > > server, if the client that registered the CQ is a native (C++) client?
> > > >
> > > > I have been going over the classes and code for some time and 
> > > > can't seem to find the actual location where a CQ 
> > > > update/notification is
> > > sent...
> > > >
> > > > It's like CqEventImpl class is never even generated in this scenario.
> > > >
> > > > If anyone can help here I would be most grateful :-)
> > > >
> > > > Thanks
> > > >
> > > > Roi
> > > >
> > > >
> > > >
> > > > -----Original Message-----
> > > > From: Anilkumar Gingade [mailto:agingade@pivotal.io]
> > > > Sent: Monday, August 07, 2017 8:23 PM
> > > > To: dev@geode.apache.org
> > > > Subject: Re: continuous query internal mechanism
> > > >
> > > > You can find those in CqServiceImpl.process*()...
> > > >
> > > > -Anil.
> > > >
> > > >
> > > > On Mon, Aug 7, 2017 at 9:14 AM, Roi Apelker 
> > > > <Roi.Apelker@amdocs.com>
> > > > wrote:
> > > >
> > > > > Hello,
> > > > >
> > > > > I am trying to look into the code of the continuous query 
> > > > > mechanism
> > > > > - where the GEODE server sends the notification back to the client.
> > > > >
> > > > > Can anyone point me to the central classes of continuous 
> > > > > query, especially to the one that is responsible for the 
> > > > > calculation of the new data and packing it as a message back to the
client?
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Roi
> > > > >
> > > > > This message and the information contained herein is 
> > > > > proprietary and confidential and subject to the Amdocs policy 
> > > > > statement,
> > > > >
> > > > > you may review at 
> > > > > https://www.amdocs.com/about/email-disclaimer < 
> > > > > https://www.amdocs.com/about/email-disclaimer>
> > > > >
> > > > This message and the information contained herein is proprietary 
> > > > and confidential and subject to the Amdocs policy statement,
> > > >
> > > > you may review at https://www.amdocs.com/about/email-disclaimer 
> > > > < https://www.amdocs.com/about/email-disclaimer>
> > > >
> > > This message and the information contained herein is proprietary 
> > > and confidential and subject to the Amdocs policy statement,
> > >
> > > you may review at https://www.amdocs.com/about/email-disclaimer < 
> > > https://www.amdocs.com/about/email-disclaimer>
> > >
> > This message and the information contained herein is proprietary and 
> > confidential and subject to the Amdocs policy statement,
> >
> > you may review at https://www.amdocs.com/about/email-disclaimer < 
> > https://www.amdocs.com/about/email-disclaimer>
> > This message and the information contained herein is proprietary and 
> > confidential and subject to the Amdocs policy statement,
> >
> > you may review at https://www.amdocs.com/about/email-disclaimer < 
> > https://www.amdocs.com/about/email-disclaimer>
> >
>
This message and the information contained herein is proprietary and confidential and subject
to the Amdocs policy statement,

you may review at https://www.amdocs.com/about/email-disclaimer <https://www.amdocs.com/about/email-disclaimer>
Mime
View raw message