incubator-s4-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matthieu Morel <mmo...@apache.org>
Subject Re: Zookeeper cluster nodes not receiving events
Date Wed, 08 Feb 2012 23:36:35 GMT
Hi Scot,

The number 0,1,2,3 in the log of the Dispatcher represent the logical
partitions (to which messages are sent).

The configuration looks ok, events seem to leave partition 0, but other
partitions don´t receive events. Including from the adapter. So it looks as
if node mach1 has no connectivity with the other machines. Can you check
that?

Also you could get more info in the .mon log files. As someone previously
mentioned, the names are a bit cryptic (see MetricsName class), but it´s
easy to find what they are about by looking for them in the source code.

Hope this helps,

Matthieu

On Wed, Feb 8, 2012 at 9:06 PM, Lunsford, Scot <slunsford@mitre.org> wrote:

>  I'm trying to run a S4 cluster using Zookeeper with 4 nodes with node0
> also running a client-adapter. I doesn't appear the other 3 nodes are
> receiving an PE's. The log on machine-0 says
>
>  2012-02-08 13:02:54,564 s4 INFO (PEContainer.java:348) PE count 690013
> … Count breakdown by PE removed …
> 2012-02-08 13:02:59,696 s4 INFO (Dispatcher.java:95) Event count is 0;
> rate 0.0
> 2012-02-08 13:02:59,696 s4 INFO (Dispatcher.java:97) Raw event count is 0;
> rate 0.0
> 2012-02-08 13:02:59,702 s4 INFO (Dispatcher.java:95) Event count is
> 13044223; rate 88.25489934675376
> 2012-02-08 13:02:59,702 s4 INFO (Dispatcher.java:97) Raw event count is
> 13044223; rate 88.25489934675376
> 2012-02-08 13:02:59,702 s4 INFO (Dispatcher.java:101) 0: 1869773
> 2012-02-08 13:02:59,702 s4 INFO (Dispatcher.java:101) 1: 3322779
> 2012-02-08 13:02:59,703 s4 INFO (Dispatcher.java:101) 2: 6098629
> 2012-02-08 13:02:59,703 s4 INFO (Dispatcher.java:101) 3: 1753024
> 2012-02-08 13:02:59,703 s4 INFO (Dispatcher.java:95) Event count is 0;
> rate 0.0
> 2012-02-08 13:03:00,999 s4 INFO (Dispatcher.java:97) Raw event count is 0;
> rate 0.0
> 2012-02-08 13:02:59,702 s4 INFO (Dispatcher.java:95) Event count is
> 1908351; rate 66.85329600746518
> 2012-02-08 13:03:01,002 s4 INFO (Dispatcher.java:97) Raw event count is
> 1908474; rate 66.85329600746518
> 2012-02-08 13:03:01,002 s4 INFO (Dispatcher.java:101) 0: 1908474
> 2012-02-08 13:03:01,002 s4 INFO (Dispatcher.java:101) 1: 1908474
> 2012-02-08 13:03:01,002 s4 INFO (Dispatcher.java:101) 2: 1908475
> 2012-02-08 13:03:01,002 s4 INFO (Dispatcher.java:101) 3: 1908475
>
>  What do the 0-4 numbers indicate? I assume this would be the number of
> events sent to each node.
> Why are there two Event and Raw event counts, one 0 and one N?
>
>  However, here is the log for machine-1 (machine 2 and 3 show identical
> logs)
>
>  2012-02-08 13:11:07,077 s4 INFO (PEContainer.java:348) PE count 0
> 2012-02-08 13:11:11,930 s4 INFO (Dispatcher.java:95) Event count is 0;
> rate 0.0
> 2012-02-08 13:11:11,930 s4 INFO (Dispatcher.java:97) Raw event count is 0;
> rate 0.0
> 2012-02-08 13:11:11,930 s4 INFO (Dispatcher.java:95) Event count is 0;
> rate 0.0
> 2012-02-08 13:11:11,930 s4 INFO (Dispatcher.java:97) Raw event count is 0;
> rate 0.0
> 2012-02-08 13:11:11,935 s4 INFO (Dispatcher.java:95) Event count is 0;
> rate 0.0
> 2012-02-08 13:11:11,935 s4 INFO (Dispatcher.java:97) Raw event count is 0;
> rate 0.0
> 2012-02-08 13:11:12,199 s4 INFO (Dispatcher.java:95) Event count is 0;
> rate 0.0
> 2012-02-08 13:11:12,199 s4 INFO (Dispatcher.java:97) Raw event count is 0;
> rate 0.0
>
>  So it would appear there are not events being processed on machines1-3.
>
>  Zookeeper shows 4 tasks under ls /s4/s4/process
>
>  I load my config into zookeeper on the lead node only with the following
> command
> $S4_IMAGE/scripts/task-setup.sh localhost:2181 clean setup
> $S4_IMAGE/s4-core/conf/dynamic/clusters.xml
>
>  <config version="-1">
>   <cluster name="s4" type="s4" mode="unicast">
>     <node>
>       <partition>0</partition>
>       <machine>mach1.company.co</machine>
>       <port>5077</port>
>       <taskId>s4node-0</taskId>
>     </node>
>     <node>
>       <partition>1</partition>
>       <machine>mach2.company.co</machine>
>       <port>5077</port>
>       <taskId>s4node-1</taskId>
>     </node>
>     <node>
>       <partition>2</partition>
>       <machine>mach3.company.co</machine>
>       <port>5077</port>
>       <taskId>s4node-2</taskId>
>     </node>
>     <node>
>       <partition>3</partition>
>       <machine>mach4.company.co</machine>
>       <port>5077</port>
>       <taskId>s4node-3</taskId>
>     </node>
>   </cluster>
>   <cluster name="client-adapter" type="s4" mode="unicast">
>     <node>
>       <partition>0</partition>
>       <machine>mach1.company.co</machine>
>       <taskId>client-adapter-0</taskId>
>       <port>6077</port>
>     </node>
>   </cluster>
> </config>
>
>  Any ideas what's keeping events from being dispatched to the other nodes?
>
>  Thanks,
> Scot
>
>

Mime
View raw message