incubator-s4-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aimee Cheng <chengsj....@gmail.com>
Subject Re: Zookeeper problem
Date Wed, 05 Sep 2012 15:51:40 GMT
Hi,

We used to meet such case: the application is still running, but the usage of JVM memory is
very high, and that makes long time full GC in this application, so other threads(e.g zkClient)
hung. If the time of Full GC is longer than the session timeout, Zookeeper will consider this
session expired. 

Maybe you can check you adapter application.

Hope this helps.

-Aimee


On Sep 5, 2012, at 9:52 PM, Davide Simoncelli wrote:

> Hello,
> 
> I'm trying to running an application on a cluster with 10 nodes. There is also an adapter
cluster with only one nodes.
> What I noticed is that the node in the adapter cluster sends events and the node on it
is running (the top command shows that the java process is using the CPU).
> The other 10 nodes (all of them) don't receive anything and the java process on each
node doesn't even use the CPU. After a while the following exception is thrown:
> 
> [ZkClient-EventThread-27-localhost:2181] ERROR o.a.s4.comm.topology.ClustersFromZK -
Zookeeper session expired, possibly due to a network partition for cluster [cluster1_adapter].
This node is considered as dead by Zookeeper. Proceeding to stop this node.
> 
> There is no error when clusters are created and nodes are started. Also the status command
shows the following output that let me to assume everything is ok:
> App Status
> ----------------------------------------------------------------------------------------------------------------------------------
>        Name              Cluster                                                  URI
                                           
> ----------------------------------------------------------------------------------------------------------------------------------
> testAppAdapter    cluster1_adapter  file:/home/s4-piper/testApp/build/libs/testAppAdapter.s4r
                          
>     testApp                 cluster1      file:/tmp/testApp.s4r                     
                                            
> ----------------------------------------------------------------------------------------------------------------------------------
> 
> 
> Cluster Status
> ----------------------------------------------------------------------------------------------------------------------------------
>                                                                                    Active
nodes                                  
>        Name                App           Tasks   --------------------------------------------------------------------------------
>                                                   Number    Task id                 
       Host                         Port    
> ----------------------------------------------------------------------------------------------------------------------------------
>  cluster1_adapter   testAppAdapter    1         1        Task-0                  computer1
                  13000   
>      cluster1           testApp                 10        10       Task-6           
      computer2                   12006   
>                                                              Task-7                 
computer4                   12007   
>                                                              Task-4                 
computer6                   12004   
>                                                              Task-5                 
computer7                   12005   
>                                                              Task-2                 
computer9                   12002   
>                                                              Task-3                 
computer11                  12003   
>                                                              Task-0                 
computer17                  12000   
>                                                              Task-1                 
computer18                  12001   
>                                                              Task-9                 
computer23                  12009   
>                                                              Task-8                 
computer37                  12008   
> ----------------------------------------------------------------------------------------------------------------------------------
> 
> 
> 
> Stream Status
> ----------------------------------------------------------------------------------------------------------------------------------
>        Name                               Producers                                 
            Consumers                       
> ----------------------------------------------------------------------------------------------------------------------------------
> RawlData                             cluster1_adapter(testAppAdapter)               
            cluster1(testApp)                  
> ----------------------------------------------------------------------------------------------------------------------------------
> 
> Could you help me?
> 
> Thank you
> 
> Regards
> 
> - Davide


Mime
View raw message