incubator-s4-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Flavio Junqueira <...@yahoo-inc.com>
Subject Re: Zookeeper problem
Date Wed, 05 Sep 2012 15:38:21 GMT
Is this reproducible? Could you collect and share the zookeeper logs?

-Flavio

On Sep 5, 2012, at 3:52 PM, Davide Simoncelli wrote:

> Hello,
> 
> I'm trying to running an application on a cluster with 10 nodes. There is also an adapter
cluster with only one nodes.
> What I noticed is that the node in the adapter cluster sends events and the node on it
is running (the top command shows that the java process is using the CPU).
> The other 10 nodes (all of them) don't receive anything and the java process on each
node doesn't even use the CPU. After a while the following exception is thrown:
> 
> [ZkClient-EventThread-27-localhost:2181] ERROR o.a.s4.comm.topology.ClustersFromZK -
Zookeeper session expired, possibly due to a network partition for cluster [cluster1_adapter].
This node is considered as dead by Zookeeper. Proceeding to stop this node.
> 
> There is no error when clusters are created and nodes are started. Also the status command
shows the following output that let me to assume everything is ok:
> App Status
> ----------------------------------------------------------------------------------------------------------------------------------
>        Name              Cluster                                                  URI
                                           
> ----------------------------------------------------------------------------------------------------------------------------------
> testAppAdapter    cluster1_adapter  file:/home/s4-piper/testApp/build/libs/testAppAdapter.s4r
                          
>     testApp                 cluster1      file:/tmp/testApp.s4r                     
                                            
> ----------------------------------------------------------------------------------------------------------------------------------
> 
> 
> Cluster Status
> ----------------------------------------------------------------------------------------------------------------------------------
>                                                                                    Active
nodes                                  
>        Name                App           Tasks   --------------------------------------------------------------------------------
>                                                   Number    Task id                 
       Host                         Port    
> ----------------------------------------------------------------------------------------------------------------------------------
>  cluster1_adapter   testAppAdapter    1         1        Task-0                  computer1
                  13000   
>      cluster1           testApp                 10        10       Task-6           
      computer2                   12006   
>                                                              Task-7                 
computer4                   12007   
>                                                              Task-4                 
computer6                   12004   
>                                                              Task-5                 
computer7                   12005   
>                                                              Task-2                 
computer9                   12002   
>                                                              Task-3                 
computer11                  12003   
>                                                              Task-0                 
computer17                  12000   
>                                                              Task-1                 
computer18                  12001   
>                                                              Task-9                 
computer23                  12009   
>                                                              Task-8                 
computer37                  12008   
> ----------------------------------------------------------------------------------------------------------------------------------
> 
> 
> 
> Stream Status
> ----------------------------------------------------------------------------------------------------------------------------------
>        Name                               Producers                                 
            Consumers                       
> ----------------------------------------------------------------------------------------------------------------------------------
> RawlData                             cluster1_adapter(testAppAdapter)               
            cluster1(testApp)                  
> ----------------------------------------------------------------------------------------------------------------------------------
> 
> Could you help me?
> 
> Thank you
> 
> Regards
> 
> - Davide


Mime
View raw message