uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "reshu.agarwal" <reshu.agar...@orkash.com>
Subject Re: DUCC-unstable behaviour od ducc
Date Fri, 05 Dec 2014 11:31:38 GMT
Dear Lou,

Thanks for confirming this.

Is Bug fixing version available for use?

What can be the reason of delaying in heartbeats? Because machines were 
not able to send heartbeats with in 60 seconds so node gets down in DUCC 
1.1.0 but DUCC 1.0.0 is working fine on same machines.

My master node is physical and client is on virtual. Can this be a 
reason for delaying in heartbeats as well as increase of processing time 
of job?

Thanks.

Reshu.

On 12/05/2014 04:45 PM, Lou DeGenaro wrote:
> Each node has a DUCC Agent daemon that sends heartbeats.
>
> There was a bug discovered after the release of 1.1 whereby the share
> calculation is incorrect (a rounding up problem that you observe).  The
> impact of this bug should be minimal.  The bug has been fixed.
>
> Lou.
>
>
>
> On Fri, Dec 5, 2014 at 12:41 AM, reshu.agarwal <reshu.agarwal@orkash.com>
> wrote:
>
>> Lou,
>>
>> How can a node send heartbeats in DUCC? If you can tell me this I will be
>> able to identify problem of down in my nodes.
>>
>> The other problem which I am facing is:
>>
>> Memory(GB):total    :   31
>> Memory(GB):usable :   16
>> Shares:total             :    8
>> Shares:inuse            :   9
>>
>>
>> Means actual RAM which is available is 30 GB so shares available should be
>> 15(2GB per share) but it is showing Memory(GB):usable :   16 and
>> Shares:total             :    8.
>>
>> In DUCC 1.0.0, I don't face this problem.
>>
>> Please explain me its reason.
>>
>> Reshu.
>>
>>
>>
>> On 12/04/2014 06:42 PM, Lou DeGenaro wrote:
>>
>>> Which of these are no understandable?  If you hover over the column
>>> heading
>>> a little more explanation is given (though not much).
>>>
>>> For example, If you hover over Heartbeat(last) you'll see "The elapsed
>>> time
>>> (in seconds) since the last heartbeat".  This should usually be around 60
>>> seconds.  On the system I'm looking at live presently, I see a range from
>>> 9
>>> to 66.  If the number gets too large, the DUCC system will consider the
>>> node down.  As best as I can tell, it looks like your numbers are 58 & 59
>>> which is perfect.
>>>
>>> Lou.
>>>
>>> On Thu, Dec 4, 2014 at 7:41 AM, reshu.agarwal <reshu.agarwal@orkash.com>
>>> wrote:
>>>
>>>   Hi,
>>>> Please look this stats:
>>>>
>>>> /    Status    Name    Memory(GB):usable Memory(GB):total Swap(GB):inuse
>>>>     Swap(GB):free    Alien PIDs    Shares:total Shares:inuse    Heartbeat
>>>> (last)
>>>>       Total                                        58 70
>>>>           0 29                         9                 29
>>>>     3
>>>>       up    S144                               36 39
>>>>       0 20                         8                18 2
>>>>    59
>>>>       down    S143                           22 31
>>>>     0 9                           1                11 11
>>>>    58
>>>> /
>>>> I am not able to understand this stats.
>>>>
>>>> Please help.
>>>>
>>>> Reshu.
>>>>
>>>>
>>>>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message