uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "reshu.agarwal" <reshu.agar...@orkash.com>
Subject Re: DUCC-unstable behaviour od ducc
Date Wed, 10 Dec 2014 09:06:08 GMT
Dear Lou,

My problem has been resolved. I just increased the max time of receiving 
Heartbeats of agents.

The "unstable behavior" of DUCC 1.1.0 in my case was the node up and 
down problem in both cases either on single instance of DUCC 1.1.0
or running both ducc versions simultaneously.

And Now, I am able to run DUCC 1.1.0 alone. So, Only DUCC 1.1.0 is 
configured.

Thanks for your help. :-)

Reshu.



On 12/08/2014 04:24 PM, Lou DeGenaro wrote:
> What is the "unstable behavior" of DUCC 1.1.0 when running it alone?
>
> All kinds of bad things can happen if you run 2 DUCCs on the same set of
> machines. I'm willing to help, but am not sure I can if you are running 2
> DUCCs - that's fairly complex.  Instead I urge you to run a single DUCC
> 1.1.0 and let's try to fix what's wrong with running it alone.
>
> Lou.
>
> On Sun, Dec 7, 2014 at 11:40 PM, reshu.agarwal <reshu.agarwal@orkash.com>
> wrote:
>
>> Yes, I am running both at same time. But I tried only 1.1.0 version to
>> check the performance.But, due to unstable behaviour I had to run DUCC
>> 1.0.0 and DUCC 1.1.0 at the same time.  I am running DUCC 1.0.0 for running
>> Jobs and DUCC 1.1.0 for testing purpose.
>>
>> Do I need to increase heartbeats timing to greater than to 60 sec?
>> Signature
>>
>> **Reshu.
>>
>>
>> On 12/05/2014 05:57 PM, Lou DeGenaro wrote:
>>
>>> You can fetch the latest code containing the bug fix from SVN and build
>>> your own snapshot.  However, this bug is of minimal impact so there is no
>>> pressing need to do so.
>>>
>>> Are you trying to run 1.0 and 1.1 at the same time?  This can be very
>>> tricky.  You need to be sure of no overlaps.  I highly recommend that you
>>> pick one or the other.
>>>
>>> Lou.
>>>
>>> On Fri, Dec 5, 2014 at 6:31 AM, reshu.agarwal <reshu.agarwal@orkash.com>
>>> wrote:
>>>
>>>   Dear Lou,
>>>> Thanks for confirming this.
>>>>
>>>> Is Bug fixing version available for use?
>>>>
>>>> What can be the reason of delaying in heartbeats? Because machines were
>>>> not able to send heartbeats with in 60 seconds so node gets down in DUCC
>>>> 1.1.0 but DUCC 1.0.0 is working fine on same machines.
>>>>
>>>> My master node is physical and client is on virtual. Can this be a reason
>>>> for delaying in heartbeats as well as increase of processing time of job?
>>>>
>>>> Thanks.
>>>>
>>>> Reshu.
>>>>
>>>>
>>>> On 12/05/2014 04:45 PM, Lou DeGenaro wrote:
>>>>
>>>>   Each node has a DUCC Agent daemon that sends heartbeats.
>>>>> There was a bug discovered after the release of 1.1 whereby the share
>>>>> calculation is incorrect (a rounding up problem that you observe).  The
>>>>> impact of this bug should be minimal.  The bug has been fixed.
>>>>>
>>>>> Lou.
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Dec 5, 2014 at 12:41 AM, reshu.agarwal <
>>>>> reshu.agarwal@orkash.com>
>>>>> wrote:
>>>>>
>>>>>    Lou,
>>>>>
>>>>>> How can a node send heartbeats in DUCC? If you can tell me this I
will
>>>>>> be
>>>>>> able to identify problem of down in my nodes.
>>>>>>
>>>>>> The other problem which I am facing is:
>>>>>>
>>>>>> Memory(GB):total    :   31
>>>>>> Memory(GB):usable :   16
>>>>>> Shares:total             :    8
>>>>>> Shares:inuse            :   9
>>>>>>
>>>>>>
>>>>>> Means actual RAM which is available is 30 GB so shares available
should
>>>>>> be
>>>>>> 15(2GB per share) but it is showing Memory(GB):usable :   16 and
>>>>>> Shares:total             :    8.
>>>>>>
>>>>>> In DUCC 1.0.0, I don't face this problem.
>>>>>>
>>>>>> Please explain me its reason.
>>>>>>
>>>>>> Reshu.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 12/04/2014 06:42 PM, Lou DeGenaro wrote:
>>>>>>
>>>>>>    Which of these are no understandable?  If you hover over the column
>>>>>>
>>>>>>> heading
>>>>>>> a little more explanation is given (though not much).
>>>>>>>
>>>>>>> For example, If you hover over Heartbeat(last) you'll see "The
elapsed
>>>>>>> time
>>>>>>> (in seconds) since the last heartbeat".  This should usually
be around
>>>>>>> 60
>>>>>>> seconds.  On the system I'm looking at live presently, I see
a range
>>>>>>> from
>>>>>>> 9
>>>>>>> to 66.  If the number gets too large, the DUCC system will consider
>>>>>>> the
>>>>>>> node down.  As best as I can tell, it looks like your numbers
are 58 &
>>>>>>> 59
>>>>>>> which is perfect.
>>>>>>>
>>>>>>> Lou.
>>>>>>>
>>>>>>> On Thu, Dec 4, 2014 at 7:41 AM, reshu.agarwal <
>>>>>>> reshu.agarwal@orkash.com
>>>>>>> wrote:
>>>>>>>
>>>>>>>     Hi,
>>>>>>>
>>>>>>>   Please look this stats:
>>>>>>>> /    Status    Name    Memory(GB):usable Memory(GB):total
>>>>>>>> Swap(GB):inuse
>>>>>>>>       Swap(GB):free    Alien PIDs    Shares:total Shares:inuse
>>>>>>>> Heartbeat
>>>>>>>> (last)
>>>>>>>>         Total                                        58 70
>>>>>>>>             0 29                         9              
  29
>>>>>>>>       3
>>>>>>>>         up    S144                               36 39
>>>>>>>>         0 20                         8                18
2
>>>>>>>>      59
>>>>>>>>         down    S143                           22 31
>>>>>>>>       0 9                           1                11 11
>>>>>>>>      58
>>>>>>>> /
>>>>>>>> I am not able to understand this stats.
>>>>>>>>
>>>>>>>> Please help.
>>>>>>>>
>>>>>>>> Reshu.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message