incubator-s4-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aimee Cheng <chengsj....@gmail.com>
Subject Re: Zookeeper problem
Date Thu, 06 Sep 2012 08:11:07 GMT
Hi Davide,

Yes, in our case, it happens after the application runs for a long time. Maybe some other
reasons making your threads hung. But I am not sure about your case.
 I think you need check logs to make sure whether the adapter keeps sending messages or there
exist some stops between sending messages.


-Aimee
 
On Sep 6, 2012, at 3:00 PM, Davide Simoncelli wrote:

> Hi,
> 
> does it happen on adapter node?
> 
> I have experimented this behavior when the application starts and not after a period
of time (I suppose the usage of JVM memory grows after a while).
> 
> Regards
> 
> - Davide
> 
> On Wednesday, September 05, 2012 06:25:43 PM Flavio Junqueira wrote:
>> Yes, GC is a candidate.
>> 
>> -Flavio
>> 
>> On Sep 5, 2012, at 5:51 PM, Aimee Cheng wrote:
>>> Hi,
>>> 
>>> We used to meet such case: the application is still running, but the usage
>>> of JVM memory is very high, and that makes long time full GC in this
>>> application, so other threads(e.g zkClient) hung. If the time of Full GC
>>> is longer than the session timeout, Zookeeper will consider this session
>>> expired.
>>> 
>>> Maybe you can check you adapter application.
>>> 
>>> Hope this helps.
>>> 
>>> -Aimee
>>> 
>>> On Sep 5, 2012, at 9:52 PM, Davide Simoncelli wrote:
>>>> Hello,
>>>> 
>>>> I'm trying to running an application on a cluster with 10 nodes. There is
>>>> also an adapter cluster with only one nodes. What I noticed is that the
>>>> node in the adapter cluster sends events and the node on it is running
>>>> (the top command shows that the java process is using the CPU). The
>>>> other 10 nodes (all of them) don't receive anything and the java process
>>>> on each node doesn't even use the CPU. After a while the following
>>>> exception is thrown:
>>>> 
>>>> [ZkClient-EventThread-27-localhost:2181] ERROR
>>>> o.a.s4.comm.topology.ClustersFromZK - Zookeeper session expired,
>>>> possibly due to a network partition for cluster [cluster1_adapter]. This
>>>> node is considered as dead by Zookeeper. Proceeding to stop this node.
>>>> 
>>>> There is no error when clusters are created and nodes are started. Also
>>>> the status command shows the following output that let me to assume
>>>> everything is ok: App Status
>>>> -------------------------------------------------------------------------
>>>> --------------------------------------------------------->> 
>>>>      Name              Cluster                                          
>>>>             URI>> 
>>>> -------------------------------------------------------------------------
>>>> --------------------------------------------------------- testAppAdapter

>>>>  cluster1_adapter 
>>>> file:/home/s4-piper/testApp/build/libs/testAppAdapter.s4r>> 
>>>>   testApp                 cluster1      file:/tmp/testApp.s4r
>>>> 
>>>> -------------------------------------------------------------------------
>>>> ---------------------------------------------------------
>>>> 
>>>> 
>>>> Cluster Status
>>>> -------------------------------------------------------------------------
>>>> --------------------------------------------------------->> 
>>>>                                                                         
        A
>>>>                                                                         
        c
>>>>                                                                         
        t
>>>>                                                                         
        i
>>>>                                                                         
        v
>>>>                                                                         
        e
>>>> 
>>>>                                                                         
        n
>>>>                                                                         
        o
>>>>                                                                         
        d
>>>>                                                                         
        e
>>>>                                                                         
        s
>>>> 
>>>>      Name                App           Tasks  
>>>>      ------------------------------------------------------------------
>>>>      -------------->>       
>>>>                                                 Number    Task id       
>>>>                                                                  Host  
>>>> 
>>>>                                                 Port
>>>> 
>>>> -------------------------------------------------------------------------
>>>> ---------------------------------------------------------
>>>> cluster1_adapter   testAppAdapter    1         1        Task-0          
>>>>       computer1                   13000>> 
>>>>    cluster1           testApp                 10        10       Task-6 
>>>>                    computer2                   12006>>     
>>>>                                                            Task-7       
>>>> 
>>>>                                                            computer4   
>>>> 
>>>>                                                              12007
>>>>                                                            Task-4       
>>>> 
>>>>                                                            computer6   
>>>> 
>>>>                                                              12004
>>>>                                                            Task-5       
>>>> 
>>>>                                                            computer7   
>>>> 
>>>>                                                              12005
>>>>                                                            Task-2       
>>>> 
>>>>                                                            computer9   
>>>> 
>>>>                                                              12002
>>>>                                                            Task-3       
>>>> 
>>>>                                                            computer11  
>>>> 
>>>>                                                              12003
>>>>                                                            Task-0       
>>>> 
>>>>                                                            computer17  
>>>> 
>>>>                                                              12000
>>>>                                                            Task-1       
>>>> 
>>>>                                                            computer18  
>>>> 
>>>>                                                              12001
>>>>                                                            Task-9       
>>>> 
>>>>                                                            computer23  
>>>> 
>>>>                                                              12009
>>>>                                                            Task-8       
>>>> 
>>>>                                                            computer37  
>>>> 
>>>>                                                              12008
>>>> 
>>>> -------------------------------------------------------------------------
>>>> ---------------------------------------------------------
>>>> 
>>>> 
>>>> 
>>>> Stream Status
>>>> -------------------------------------------------------------------------
>>>> --------------------------------------------------------->> 
>>>>      Name                               Producers                       
>>>>                            Consumers>> 
>>>> -------------------------------------------------------------------------
>>>> --------------------------------------------------------- RawlData      

>>>>                     cluster1_adapter(testAppAdapter)                   
>>>>        cluster1(testApp)
>>>> ------------------------------------------------------------------------
>>>> ----------------------------------------------------------
>>>> 
>>>> Could you help me?
>>>> 
>>>> Thank you
>>>> 
>>>> Regards
>>>> 
>>>> - Davide


Mime
View raw message