uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lou DeGenaro <lou.degen...@gmail.com>
Subject Re: DUCC-1.1.0: Machines are going down very frequently
Date Mon, 17 Nov 2014 11:58:13 GMT
Also, do all the daemons on the System -> Daemons page show status "up"?

Have a look at the Broker page for live demo on Apache here:
http://uima-ducc-vm.apache.org:42133/system.broker.jsp and compare with
yours.  Do all of the Topics appear with consumers > 0 ?

On Mon, Nov 17, 2014 at 6:48 AM, Lou DeGenaro <lou.degenaro@gmail.com>
wrote:

> Reshu,
>
> Have you tried looking at the log files in DUCC's log directory for signs
> of errors or exceptions?  Are any daemons producing core dumps?
>
> Lou.
>
> On Mon, Nov 17, 2014 at 1:21 AM, reshu.agarwal <reshu.agarwal@orkash.com>
> wrote:
>
>>
>> Dear Lou,
>>
>> I am using default configuration:
>>
>> ducc.agent.node.metrics.publish.rate=30000
>> ducc.rm.node.stability = 5
>>
>> Reshu.
>>
>>
>> Signature On 11/12/2014 05:03 PM, Lou DeGenaro wrote:
>>
>>> What do you have defined in your ducc.properties for
>>> ducc.rm.node.stability and ducc.agent.node.metrics.publish.rate?  The
>>> Web Server considers a node down according to the following
>>> calculation:
>>>
>>> private long getAgentMillisMIA() {
>>>          String location = "getAgentMillisMIA";
>>>          long secondsMIA = DOWN_AFTER_SECONDS*SECONDS_PER_MILLI;
>>>          Properties properties = DuccWebProperties.get();
>>>          String s_tolerance = properties.getProperty("ducc.
>>> rm.node.stability");
>>>          String s_rate =
>>> properties.getProperty("ducc.agent.node.metrics.publish.rate");
>>>          try {
>>>              long tolerance = Long.parseLong(s_tolerance.trim());
>>>              long rate = Long.parseLong(s_rate.trim());
>>>              secondsMIA = (tolerance * rate) / 1000;
>>>          }
>>>          catch(Throwable t) {
>>>              logger.warn(location, jobid, t);
>>>          }
>>>          return secondsMIA;
>>>      }
>>>
>>> The default is 65 seconds. Note that the Web Server has no effect on
>>> actual operations in this case.  If is just a reporter of information.
>>>
>>> Lou.
>>>
>>> On Wed, Nov 12, 2014 at 12:45 AM, reshu.agarwal
>>> <reshu.agarwal@orkash.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> When I was trying DUCC-1.1.0 on multi machine, I have faced an up-down
>>>> status problem in machines. I have configured two machines and these
>>>> machines are going down one by one. This makes the DUCC Services
>>>> disable and
>>>> Jobs to be initialize again and again.
>>>>
>>>> DUCC 1.0.0 was working fine on same machines.
>>>>
>>>> How can I fix this problem? I have also compared ducc.properties file
>>>> for
>>>> both versions. Both are using same configuration to check heartbeats.
>>>>
>>>> Re-Initialization of Jobs are increasing the processing time. Can I
>>>> change
>>>> or re-configure this process?
>>>>
>>>> Services are getting disabled automatically and showing excessive
>>>> Initialization error status on mark over on disabled status but logs
>>>> are not
>>>> showing any error.
>>>>
>>>> I have to use DUCC 1.0.0 instead of DUCC 1.1.0.
>>>>
>>>> Thanks in Advance.
>>>>
>>>> --
>>>> Signature *Reshu Agarwal*
>>>>
>>>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message