uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lou DeGenaro <lou.degen...@gmail.com>
Subject Re: DUCC-unstable behaviour od ducc
Date Mon, 08 Dec 2014 10:54:22 GMT
What is the "unstable behavior" of DUCC 1.1.0 when running it alone?

All kinds of bad things can happen if you run 2 DUCCs on the same set of
machines. I'm willing to help, but am not sure I can if you are running 2
DUCCs - that's fairly complex.  Instead I urge you to run a single DUCC
1.1.0 and let's try to fix what's wrong with running it alone.

Lou.

On Sun, Dec 7, 2014 at 11:40 PM, reshu.agarwal <reshu.agarwal@orkash.com>
wrote:

>
> Yes, I am running both at same time. But I tried only 1.1.0 version to
> check the performance.But, due to unstable behaviour I had to run DUCC
> 1.0.0 and DUCC 1.1.0 at the same time.  I am running DUCC 1.0.0 for running
> Jobs and DUCC 1.1.0 for testing purpose.
>
> Do I need to increase heartbeats timing to greater than to 60 sec?
> Signature
>
> **Reshu.
>
>
> On 12/05/2014 05:57 PM, Lou DeGenaro wrote:
>
>> You can fetch the latest code containing the bug fix from SVN and build
>> your own snapshot.  However, this bug is of minimal impact so there is no
>> pressing need to do so.
>>
>> Are you trying to run 1.0 and 1.1 at the same time?  This can be very
>> tricky.  You need to be sure of no overlaps.  I highly recommend that you
>> pick one or the other.
>>
>> Lou.
>>
>> On Fri, Dec 5, 2014 at 6:31 AM, reshu.agarwal <reshu.agarwal@orkash.com>
>> wrote:
>>
>>  Dear Lou,
>>>
>>> Thanks for confirming this.
>>>
>>> Is Bug fixing version available for use?
>>>
>>> What can be the reason of delaying in heartbeats? Because machines were
>>> not able to send heartbeats with in 60 seconds so node gets down in DUCC
>>> 1.1.0 but DUCC 1.0.0 is working fine on same machines.
>>>
>>> My master node is physical and client is on virtual. Can this be a reason
>>> for delaying in heartbeats as well as increase of processing time of job?
>>>
>>> Thanks.
>>>
>>> Reshu.
>>>
>>>
>>> On 12/05/2014 04:45 PM, Lou DeGenaro wrote:
>>>
>>>  Each node has a DUCC Agent daemon that sends heartbeats.
>>>>
>>>> There was a bug discovered after the release of 1.1 whereby the share
>>>> calculation is incorrect (a rounding up problem that you observe).  The
>>>> impact of this bug should be minimal.  The bug has been fixed.
>>>>
>>>> Lou.
>>>>
>>>>
>>>>
>>>> On Fri, Dec 5, 2014 at 12:41 AM, reshu.agarwal <
>>>> reshu.agarwal@orkash.com>
>>>> wrote:
>>>>
>>>>   Lou,
>>>>
>>>>> How can a node send heartbeats in DUCC? If you can tell me this I will
>>>>> be
>>>>> able to identify problem of down in my nodes.
>>>>>
>>>>> The other problem which I am facing is:
>>>>>
>>>>> Memory(GB):total    :   31
>>>>> Memory(GB):usable :   16
>>>>> Shares:total             :    8
>>>>> Shares:inuse            :   9
>>>>>
>>>>>
>>>>> Means actual RAM which is available is 30 GB so shares available should
>>>>> be
>>>>> 15(2GB per share) but it is showing Memory(GB):usable :   16 and
>>>>> Shares:total             :    8.
>>>>>
>>>>> In DUCC 1.0.0, I don't face this problem.
>>>>>
>>>>> Please explain me its reason.
>>>>>
>>>>> Reshu.
>>>>>
>>>>>
>>>>>
>>>>> On 12/04/2014 06:42 PM, Lou DeGenaro wrote:
>>>>>
>>>>>   Which of these are no understandable?  If you hover over the column
>>>>>
>>>>>> heading
>>>>>> a little more explanation is given (though not much).
>>>>>>
>>>>>> For example, If you hover over Heartbeat(last) you'll see "The elapsed
>>>>>> time
>>>>>> (in seconds) since the last heartbeat".  This should usually be around
>>>>>> 60
>>>>>> seconds.  On the system I'm looking at live presently, I see a range
>>>>>> from
>>>>>> 9
>>>>>> to 66.  If the number gets too large, the DUCC system will consider
>>>>>> the
>>>>>> node down.  As best as I can tell, it looks like your numbers are
58 &
>>>>>> 59
>>>>>> which is perfect.
>>>>>>
>>>>>> Lou.
>>>>>>
>>>>>> On Thu, Dec 4, 2014 at 7:41 AM, reshu.agarwal <
>>>>>> reshu.agarwal@orkash.com
>>>>>> wrote:
>>>>>>
>>>>>>    Hi,
>>>>>>
>>>>>>  Please look this stats:
>>>>>>>
>>>>>>> /    Status    Name    Memory(GB):usable Memory(GB):total
>>>>>>> Swap(GB):inuse
>>>>>>>      Swap(GB):free    Alien PIDs    Shares:total Shares:inuse
>>>>>>> Heartbeat
>>>>>>> (last)
>>>>>>>        Total                                        58 70
>>>>>>>            0 29                         9                 29
>>>>>>>      3
>>>>>>>        up    S144                               36 39
>>>>>>>        0 20                         8                18 2
>>>>>>>     59
>>>>>>>        down    S143                           22 31
>>>>>>>      0 9                           1                11 11
>>>>>>>     58
>>>>>>> /
>>>>>>> I am not able to understand this stats.
>>>>>>>
>>>>>>> Please help.
>>>>>>>
>>>>>>> Reshu.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message