qpid-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adel Boutros <Adelbout...@live.com>
Subject Re: Testing failover on dispatcher/java-broker cluster
Date Fri, 30 Sep 2016 14:26:06 GMT
Hello Ted,


I confirm all my tests are GREEN at head of 0.6.x branch.


For reference:

Qpid Java Broker: 6.0.4

Qpid Proton: 0.12.2

Compiler: gcc 4.9.1

OS: Linux Red Hat


Regards,

Adel

________________________________
From: Adel Boutros <Adelboutros@live.com>
Sent: Friday, September 30, 2016 3:07:56 PM
To: users@qpid.apache.org
Subject: Re: Testing failover on dispatcher/java-broker cluster

Great!


I have synched your changes and we will run my tests.

I will get back to you with the results as soon as possible.


Regards,

Adel

________________________________
From: Ted Ross <tross@redhat.com>
Sent: Friday, September 30, 2016 2:39:51 PM
To: users@qpid.apache.org
Subject: Re: Testing failover on dispatcher/java-broker cluster

Done.  I've pushed the four cherry-picked commits to the 0.6.x branch if
you'd like to give it a go.

-Ted

On 09/30/2016 05:47 AM, Adel Boutros wrote:
> Hello Ted,
>
>
> Following discussions here (http://qpid.2158936.n2.nabble.com/Dispatch-router-0-6-1-Configuration-bugs-td7651334.html),
can DISPATCH-500 be included in the minor release?
>
>
> PS: It still hasn't solved my below issue but I will continue the analysis on the other
thread
>
>
> Regards,
>
> Adel
>
> Apache Qpid users - [Dispatch router 0.6.1] Configuration bugs<http://qpid.2158936.n2.nabble.com/Dispatch-router-0-6-1-Configuration-bugs-td7651334.html>
> qpid.2158936.n2.nabble.com
> [Dispatch router 0.6.1] Configuration bugs. Hello, As a follow up to my previous thread,
I am having some issues with the dispatch router. I will start with the first one here: It
seems the...
>
>
> ________________________________
> From: Adel Boutros <Adelboutros@live.com>
> Sent: Thursday, September 29, 2016 5:01:45 PM
> To: users@qpid.apache.org
> Subject: Re: Testing failover on dispatcher/java-broker cluster
>
> I would expect what you have described however it doesn't seem to be the case.
>
>
> delete/recreate mobile address:
>
> qdmanage -b amqp://localhost:10501 delete --type=address --name haProxy.queue.addr
> qdmanage -b amqp://localhost:10501 create --type=address prefix=haProxy.queue waypoint=true
name=haProxy.queue.addr
>
> The stats remain at a positive value (10 10). If I restart the dispatchers without the
inter-router connection, I don't have the issue.
>
> Router Addresses
>   class   addr                             phs  distrib    in-proc  local  remote  cntnr
 in  out  thru  to-proc  from-proc
>   ==================================================================================
>   mobile  haProxy.queue          1    balanced   0           0          0          0
       0    0      0         0           0
>   mobile  haProxy.queue          0    balanced   0           1          0          0
      10  10     0        0            0
>
>
> Adel
>
> ________________________________
> From: Ted Ross <tross@redhat.com>
> Sent: Thursday, September 29, 2016 4:55 PM
> To: users@qpid.apache.org
> Subject: Re: Testing failover on dispatcher/java-broker cluster
>
>
>
> On 09/29/2016 10:47 AM, Adel Boutros wrote:
>> They seem fair enough and quite related.
>>
>>
>> As a side note, I have a bug with the dispatch router 0.6.1 but I haven't submitted
it yet because I haven't reduced the test case yet.
>>
>> In resume, when I connect 2 dispatchers (inter-router) and then delete the connector/listener
of "inter-router". If I delete and recreate a mobile address which has received a message
on one of the dispatchers, the stats of the "in" and "out" do not reset to 0 when doing "qdstat
-a" but they remain at the old values. However they reset correctly on the other router.
>
> What exactly do you mean by "delete and recreate a mobile address"?
>
> If an address is removed from the table, the next time it appears, a new
> record will be created for that address.  The new record will have
> zeroed statistics.  What behavior are you expecting?
>
>>
>>
>> Have you encountered something similar? Once I have a reduced test case, I will post
it in a different thread of course.
>>
>>
>> Regards,
>>
>> Adel
>>
>> ________________________________
>> From: Ted Ross <tross@redhat.com>
>> Sent: Thursday, September 29, 2016 4:38:26 PM
>> To: users@qpid.apache.org
>> Subject: Re: Testing failover on dispatcher/java-broker cluster
>>
>> Sorry, those Jira numbers and descriptions are mismatched.  Here's the
>> correct list:
>>
>>     - DISPATCH-496 - Activation of an autolink does not result in issuing
>>                      credit to a blocked sender
>>     - DISPATCH-505 - Eventual loss of credit on inter-router control
>>                      links when the topology changes
>>     - DISPATCH-523 - Topology changes can cause in-flight deliveries to
>>                      be stuck in the ingress router
>>
>>
>> On 09/29/2016 10:35 AM, Ted Ross wrote:
>>>
>>> On 09/24/2016 05:32 AM, Adel Boutros wrote:
>>>> We are indeed in favor of a minor release as long as the latest
>>>> version is still 0.6.x and we are willing to re-launch our tests and
>>>> give feedback on the release candidate once provided (It shouldn't
>>>> take us more than a day to compile and test).
>>>> Do you have a list of fixes in mind?
>>>
>>> I've identified three fixes that look like good candidates for 0.6.2:
>>>
>>>   - DISPATCH-496 - Topology changes can cause in-flight deliveries to
>>>                    be stuck in the ingress router
>>>   - DISPATCH-505 - Eventual loss of credit on inter-router control
>>>                    links when the topology changes
>>>   - DISPATCH-523 - Activation of an autolink does not result in issuing
>>>                    credit to a blocked sender
>>>
>>> These are all stability-related issues.
>>>
>>> Thoughts?
>>>
>>> -Ted
>>>
>>>> Regards,Adel
>>>>
>>>>> Subject: Re: Testing failover on dispatcher/java-broker cluster
>>>>> To: users@qpid.apache.org
>>>>> From: tross@redhat.com
>>>>> Date: Fri, 23 Sep 2016 17:23:57 -0400
>>>>>
>>>>> Hi Adel,
>>>>>
>>>>> A minor release is always possible.  It's up to us, the community, to
>>>>> decide whether and when to produce one.  I'm in favor of releasing an
>>>>> 0.6.2 with some small backports to fix bugs for users that want to stay
>>>>> on Proton 0.12.
>>>>>
>>>>> -Ted
>>>>>
>>>>> On 09/23/2016 09:44 AM, Adel Boutros wrote:
>>>>>> Hello Ted,
>>>>>> Did you happen to have the time to check if a minor release is
>>>>>> possible?
>>>>>> Regards,Adel
>>>>>>
>>>>>>> From: adelboutros@live.com
>>>>>>> To: users@qpid.apache.org
>>>>>>> Subject: RE: Testing failover on dispatcher/java-broker cluster
>>>>>>> Date: Tue, 20 Sep 2016 15:13:03 +0200
>>>>>>>
>>>>>>> Hello Ted,
>>>>>>>
>>>>>>> I confirm the fix solved the issue.
>>>>>>>
>>>>>>> Would it be possible to do a 0.6.2 release? We cannot compile
newer
>>>>>>> versions of Proton (We currently use 0.12.2) due to lack of
>>>>>>> resources from our side and we really need this fix for our tests.
>>>>>>>
>>>>>>> Regards,
>>>>>>> Adel
>>>>>>>
>>>>>>>> Subject: Re: Testing failover on dispatcher/java-broker cluster
>>>>>>>> To: users@qpid.apache.org
>>>>>>>> From: tross@redhat.com
>>>>>>>> Date: Mon, 19 Sep 2016 12:18:23 -0400
>>>>>>>>
>>>>>>>> Hi Adel,
>>>>>>>>
>>>>>>>> It's a one-liner and it applies cleanly to the 0.6.x branch.
>>>>>>>>
>>>>>>>> https://git-wip-us.apache.org/repos/asf?p=qpid-dispatch.git;h=41b7407
>>>>>>>>
>>>>>>>> -Ted
>>>>>>>>
>>>>>>>>
>>>>>>>> On 09/19/2016 11:41 AM, Adel Boutros wrote:
>>>>>>>>> Hello Ted,
>>>>>>>>>
>>>>>>>>> Antoine is on vacation so I will be taking over this
task.
>>>>>>>>>
>>>>>>>>> Does this fix have any dependencies? We would like to
apply it on
>>>>>>>>> 0.6.1 without other fixes because it seems the master
branch
>>>>>>>>> requires proton 0.13.0 minimum whereas we have currently
0.12.2
>>>>>>>>> and we cannot upgrade at the time being.
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>> Adel
>>>>>>>>>
>>>>>>>>>> Subject: Re: Testing failover on dispatcher/java-broker
cluster
>>>>>>>>>> To: users@qpid.apache.org
>>>>>>>>>> From: tross@redhat.com
>>>>>>>>>> Date: Fri, 16 Sep 2016 16:53:05 -0400
>>>>>>>>>>
>>>>>>>>>> Antoine,
>>>>>>>>>>
>>>>>>>>>> I think I know what that problem is.  I belileve
you've stumbled
>>>>>>>>>> upon
>>>>>>>>>> this issue:
>>>>>>>>>>
>>>>>>>>>> https://issues.apache.org/jira/browse/DISPATCH-496
>>>>>>>>>>
>>>>>>>>>> Your second delivery, the one resulting in a timeout,
is causing
>>>>>>>>>> the
>>>>>>>>>> inbound link to be blocked (i.e. it has undelivered
messages).
>>>>>>>>>> When the
>>>>>>>>>> broker reattaches, the blocked links are supposed
to become
>>>>>>>>>> unblocked
>>>>>>>>>> but they don't in the case of auto-links.
>>>>>>>>>>
>>>>>>>>>> This has been fixed on the master branch if you'd
like to try
>>>>>>>>>> applying
>>>>>>>>>> the patch.
>>>>>>>>>>
>>>>>>>>>> -Ted
>>>>>>>>>>
>>>>>>>>>> On 09/15/2016 04:56 AM, Antoine Chevin wrote:
>>>>>>>>>>> Hi Ted,
>>>>>>>>>>>
>>>>>>>>>>> You’re right, the connection close looked strange
before
>>>>>>>>>>> stopping of the
>>>>>>>>>>> broker. I manually added the annotation (# stopping
the broker)
>>>>>>>>>>> and was
>>>>>>>>>>> wrong about the position of this one. I replayed
the test and the
>>>>>>>>>>> connection close happens *after* the broker stop.
I assume it
>>>>>>>>>>> is the broker
>>>>>>>>>>> that initiates it.
>>>>>>>>>>>
>>>>>>>>>>> I found something interesting. In my test, I
always sent a
>>>>>>>>>>> message when the
>>>>>>>>>>> broker is down, expecting to get a JmsSendTimedOutException
>>>>>>>>>>> (waiting for
>>>>>>>>>>> the disposition frame). I assumed this was harmless.
But it
>>>>>>>>>>> turns out this
>>>>>>>>>>> is not. When I don’t do that, I can send a
message after the
>>>>>>>>>>> broker
>>>>>>>>>>> restart. So to sum up the experiment I did:
>>>>>>>>>>>
>>>>>>>>>>> * I use Wireshark between the JMS client and
the dispatcher. *
>>>>>>>>>>>
>>>>>>>>>>> 1)      Using JMS I establish a connection to
the dispatcher
>>>>>>>>>>> and create a
>>>>>>>>>>> message producer (Wireshark: connection open
-> attach)
>>>>>>>>>>> 2)      I’m able to send a message to the broker
through the
>>>>>>>>>>> dispatcher (
>>>>>>>>>>> Wireshark: transfer -> disposition)
>>>>>>>>>>> 3)      I stop the broker
>>>>>>>>>>> 4)      With the same link, I send a message
and I get a
>>>>>>>>>>> JmsSendTimedOutException (waiting for the disposition
frame)
>>>>>>>>>>> (Wireshark:
>>>>>>>>>>> transfer)
>>>>>>>>>>> 5)      I restart the broker
>>>>>>>>>>> 6)      With the same link, I try to send a message
and I get a
>>>>>>>>>>> JmsSendTimedOutException for the same reason
(waiting for the
>>>>>>>>>>> disposition
>>>>>>>>>>> frame) (Wireshark: transfer)
>>>>>>>>>>>
>>>>>>>>>>> If I skip step (4), I cannot reproduce step (6)
and my messages
>>>>>>>>>>> arrive
>>>>>>>>>>> (Wireshark: transfer -> disposition) to the
restarted broker.
>>>>>>>>>>>
>>>>>>>>>>> I hope it makes it clearer for you. Sorry for
my rookie
>>>>>>>>>>> mistakes :-).
>>>>>>>>>>>
>>>>>>>>>>> Note: My colleague and I ran a small experiment
to identify if
>>>>>>>>>>> the problem
>>>>>>>>>>> comes from JMS or the AMQP protocol. He changed
the code of the
>>>>>>>>>>> java broker
>>>>>>>>>>> to not send the disposition frame one time out
of two.
>>>>>>>>>>>
>>>>>>>>>>> We got these results:
>>>>>>>>>>>
>>>>>>>>>>> * I use Wireshark between the JMS client and
the patched broker. *
>>>>>>>>>>>
>>>>>>>>>>> 1) Using JMS I establish a connection to the
patched broker and
>>>>>>>>>>> create a
>>>>>>>>>>> message producer (Wireshark: connection open
-> attach)
>>>>>>>>>>> 2)  I send a message to the broker and it replies
with the
>>>>>>>>>>> disposition
>>>>>>>>>>> frame (Wireshark: transfer -> disposition)
>>>>>>>>>>> 3) I send a message to the broker which drops
the disposition
>>>>>>>>>>> frame. I get
>>>>>>>>>>> a send timeout in JMS (Wireshark: transfer)
>>>>>>>>>>> 2)  I send a message to the broker and it replies
with the
>>>>>>>>>>> disposition frame
>>>>>>>>>>> (Wireshark: transfer -> disposition). It works
fine.
>>>>>>>>>>>
>>>>>>>>>>> We assume that there is something going on in
the dispatcher.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Antoine
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>>
>>>>>>>>>> To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
>>>>>>>>>> For additional commands, e-mail: users-help@qpid.apache.org
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> ---------------------------------------------------------------------
>>>>>>>> To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
>>>>>>>> For additional commands, e-mail: users-help@qpid.apache.org
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
>>>>> For additional commands, e-mail: users-help@qpid.apache.org
>>>>>
>>>>
>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
>>> For additional commands, e-mail: users-help@qpid.apache.org
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
>> For additional commands, e-mail: users-help@qpid.apache.org
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
> For additional commands, e-mail: users-help@qpid.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
For additional commands, e-mail: users-help@qpid.apache.org


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message