incubator-s4-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matthieu Morel <mmo...@apache.org>
Subject Re: Zookeeper problem
Date Wed, 05 Sep 2012 15:30:27 GMT
Hi,

this is weird. Zookeeper session expiration is issued from Zookeeper 
itself, when a client (here, S4 node) is considered as partitioned. This 
is not expected to happen in normal operation.

In your case, it looks like the adapter lost connectivity with Zookeeper.

I am not familiar with your setup, but things to check would be:
- can you see any relevant message in log of Zookeeper?
- the error message, it comes from the adapter, right?
- does the adapter keep sending messages, or does it hangs at some point 
(which could explain the loss of connectivity with Zookeeper).
- when you check the status after the error message is received, is the 
adapter node still present?

Let us know,

thanks,

Matthieu







On 9/5/12 3:52 PM, Davide Simoncelli wrote:
> Hello,
>
> I'm trying to running an application on a cluster with 10 nodes. There is also an adapter
cluster with only one nodes.
> What I noticed is that the node in the adapter cluster sends events and the node on it
is running (the top command shows that the java process is using the CPU).
> The other 10 nodes (all of them) don't receive anything and the java process on each
node doesn't even use the CPU. After a while the following exception is thrown:
>
> [ZkClient-EventThread-27-localhost:2181] ERROR o.a.s4.comm.topology.ClustersFromZK -
Zookeeper session expired, possibly due to a network partition for cluster [cluster1_adapter].
This node is considered as dead by Zookeeper. Proceeding to stop this node.
>
> There is no error when clusters are created and nodes are started. Also the status command
shows the following output that let me to assume everything is ok:
> App Status
> ----------------------------------------------------------------------------------------------------------------------------------
>          Name              Cluster                                                  URI
> ----------------------------------------------------------------------------------------------------------------------------------
>   testAppAdapter    cluster1_adapter  file:/home/s4-piper/testApp/build/libs/testAppAdapter.s4r
>       testApp                 cluster1      file:/tmp/testApp.s4r
> ----------------------------------------------------------------------------------------------------------------------------------
>
>
> Cluster Status
> ----------------------------------------------------------------------------------------------------------------------------------
>                                                                                     
Active nodes
>          Name                App           Tasks   --------------------------------------------------------------------------------
>                                                     Number    Task id               
         Host                         Port
> ----------------------------------------------------------------------------------------------------------------------------------
>    cluster1_adapter   testAppAdapter    1         1        Task-0                  computer1
                  13000
>        cluster1           testApp                 10        10       Task-6         
        computer2                   12006
>                                                                Task-7               
  computer4                   12007
>                                                                Task-4               
  computer6                   12004
>                                                                Task-5               
  computer7                   12005
>                                                                Task-2               
  computer9                   12002
>                                                                Task-3               
  computer11                  12003
>                                                                Task-0               
  computer17                  12000
>                                                                Task-1               
  computer18                  12001
>                                                                Task-9               
  computer23                  12009
>                                                                Task-8               
  computer37                  12008
> ----------------------------------------------------------------------------------------------------------------------------------
>
>
>
> Stream Status
> ----------------------------------------------------------------------------------------------------------------------------------
>          Name                               Producers                               
              Consumers
> ----------------------------------------------------------------------------------------------------------------------------------
>   RawlData                             cluster1_adapter(testAppAdapter)             
              cluster1(testApp)
> ----------------------------------------------------------------------------------------------------------------------------------
>
> Could you help me?
>
> Thank you
>
> Regards
>
> - Davide
>


Mime
View raw message