incubator-s4-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Flavio Junqueira <...@yahoo-inc.com>
Subject Re: Zookeeper problem
Date Wed, 05 Sep 2012 16:25:43 GMT
Yes, GC is a candidate.

-Flavio

On Sep 5, 2012, at 5:51 PM, Aimee Cheng wrote:

> Hi,
> 
> We used to meet such case: the application is still running, but the usage of JVM memory
is very high, and that makes long time full GC in this application, so other threads(e.g zkClient)
hung. If the time of Full GC is longer than the session timeout, Zookeeper will consider this
session expired. 
> 
> Maybe you can check you adapter application.
> 
> Hope this helps.
> 
> -Aimee
> 
> 
> On Sep 5, 2012, at 9:52 PM, Davide Simoncelli wrote:
> 
>> Hello,
>> 
>> I'm trying to running an application on a cluster with 10 nodes. There is also an
adapter cluster with only one nodes.
>> What I noticed is that the node in the adapter cluster sends events and the node
on it is running (the top command shows that the java process is using the CPU).
>> The other 10 nodes (all of them) don't receive anything and the java process on each
node doesn't even use the CPU. After a while the following exception is thrown:
>> 
>> [ZkClient-EventThread-27-localhost:2181] ERROR o.a.s4.comm.topology.ClustersFromZK
- Zookeeper session expired, possibly due to a network partition for cluster [cluster1_adapter].
This node is considered as dead by Zookeeper. Proceeding to stop this node.
>> 
>> There is no error when clusters are created and nodes are started. Also the status
command shows the following output that let me to assume everything is ok:
>> App Status
>> ----------------------------------------------------------------------------------------------------------------------------------
>>       Name              Cluster                                                 
URI                                            
>> ----------------------------------------------------------------------------------------------------------------------------------
>> testAppAdapter    cluster1_adapter  file:/home/s4-piper/testApp/build/libs/testAppAdapter.s4r
                          
>>    testApp                 cluster1      file:/tmp/testApp.s4r                  
                                               
>> ----------------------------------------------------------------------------------------------------------------------------------
>> 
>> 
>> Cluster Status
>> ----------------------------------------------------------------------------------------------------------------------------------
>>                                                                                 
 Active nodes                                  
>>       Name                App           Tasks   --------------------------------------------------------------------------------
>>                                                  Number    Task id              
          Host                         Port    
>> ----------------------------------------------------------------------------------------------------------------------------------
>> cluster1_adapter   testAppAdapter    1         1        Task-0                  computer1
                  13000   
>>     cluster1           testApp                 10        10       Task-6        
         computer2                   12006   
>>                                                             Task-7              
   computer4                   12007   
>>                                                             Task-4              
   computer6                   12004   
>>                                                             Task-5              
   computer7                   12005   
>>                                                             Task-2              
   computer9                   12002   
>>                                                             Task-3              
   computer11                  12003   
>>                                                             Task-0              
   computer17                  12000   
>>                                                             Task-1              
   computer18                  12001   
>>                                                             Task-9              
   computer23                  12009   
>>                                                             Task-8              
   computer37                  12008   
>> ----------------------------------------------------------------------------------------------------------------------------------
>> 
>> 
>> 
>> Stream Status
>> ----------------------------------------------------------------------------------------------------------------------------------
>>       Name                               Producers                              
               Consumers                       
>> ----------------------------------------------------------------------------------------------------------------------------------
>> RawlData                             cluster1_adapter(testAppAdapter)           
                cluster1(testApp)                  
>> ----------------------------------------------------------------------------------------------------------------------------------
>> 
>> Could you help me?
>> 
>> Thank you
>> 
>> Regards
>> 
>> - Davide
> 


Mime
View raw message