qpid-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alan Conway <acon...@redhat.com>
Subject Re: QPID 0.8 Cluster. Node failure.
Date Wed, 17 Aug 2011 13:20:59 GMT
On 08/17/2011 09:14 AM, Zhemzhitsky Sergey wrote:
> Hi there,
>
> I have a two-node cluster which is built from qpid 0.8 and corosync 1.2.3.
>
>  From time to time one node of the cluster stops running.
> However there are no anything special in the log files of the QPID process.
>
> 2011-08-16 09:18:33 warning JournalInactive:TplStore timer woken up 192ms late, overrunning
by 192ms [taking 6297ns]
> 2011-08-16 09:18:33 warning JournalInactive:smx.stdint.finbroker timer woken up 192ms
late
> 2011-08-16 12:12:03 warning JournalInactive:TplStore timer callback overran by 13ms [taking
6107ns]
> 2011-08-17 12:00:26 warning JournalInactive:TplStore timer callback overran by 3ms [taking
6848ns]
> 2011-08-17 16:35:13 notice cluster(10.20.3.125:1918 READY) configuration change: 10.20.3.125:1918
> 2011-08-17 16:35:13 notice cluster(10.20.3.125:1918 READY) Members left: 10.20.3.120:3728
> 2011-08-17 16:35:13 notice cluster(10.20.3.125:1918 READY)Sole member of cluster, marking
store clean.
> 2011-08-17 16:35:13 notice cluster(10.20.3.125:1918 READY) last broker standing, update
queue policies
> 2011-08-17 16:35:14 notice cluster(10.20.3.125:1918 READY) configuration change:
> 2011-08-17 16:35:14 notice cluster(10.20.3.125:1918 READY) Members left: 10.20.3.125:1918
>
> At the same time in the log files of corosync there is a string "A processor failed,
forming new configuration"
>
> Aug 08 11:55:13 corosync [CPG   ] downlist received left_list: 1
> Aug 08 11:55:13 corosync [CPG   ] chosen downlist from node r(0) ip(10.20.3.125)
> Aug 08 11:55:13 corosync [MAIN  ] Completed service synchronization, ready to provide
service.
> Aug 17 16:35:12 corosync [TOTEM ] A processor failed, forming new configuration.
> Aug 17 16:35:13 corosync [TOTEM ] A processor joined or left the membership and a new
membership was formed.
> Aug 17 16:35:13 corosync [CPG   ] downlist received left_list: 1
> Aug 17 16:35:13 corosync [CPG   ] chosen downlist from node r(0) ip(10.20.3.125)
> Aug 17 16:35:13 corosync [MAIN  ] Completed service synchronization, ready to provide
service.
> Aug 17 16:35:14 corosync [TOTEM ] A processor joined or left the membership and a new
membership was formed.
> Aug 17 16:35:14 corosync [CPG   ] downlist received left_list: 0
> Aug 17 16:35:14 corosync [CPG   ] downlist received left_list: 1
> Aug 17 16:35:14 corosync [CPG   ] chosen downlist from node r(0) ip(10.20.3.120)
> Aug 17 16:35:14 corosync [MAIN  ] Completed service synchronization, ready to provide
service.
>
> As a rule one node of the QPID cluster becomes unavailable just after "A processor failed,
forming new configuration" occurs in the log file of corosync.

I suspect the qpid broker crashes just *before* the CPG message. The CPG message 
is informing you that the process failed.

Does the failed process leave a core file? Check that you are allowing core 
files: that ulimit -c says "unlimited" in the context that is starting the 
broker. You can verify that you are allowing core files by doing "kill -abrt" on 
a broker, that should force a core file.

---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project:      http://qpid.apache.org
Use/Interact: mailto:users-subscribe@qpid.apache.org


Mime
View raw message