uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lou DeGenaro <lou.degen...@gmail.com>
Subject Re: DUCC-1.1.0: Machines are going down very frequently
Date Mon, 17 Nov 2014 11:48:31 GMT
Reshu,

Have you tried looking at the log files in DUCC's log directory for signs
of errors or exceptions?  Are any daemons producing core dumps?

Lou.

On Mon, Nov 17, 2014 at 1:21 AM, reshu.agarwal <reshu.agarwal@orkash.com>
wrote:

>
> Dear Lou,
>
> I am using default configuration:
>
> ducc.agent.node.metrics.publish.rate=30000
> ducc.rm.node.stability = 5
>
> Reshu.
>
>
> Signature On 11/12/2014 05:03 PM, Lou DeGenaro wrote:
>
>> What do you have defined in your ducc.properties for
>> ducc.rm.node.stability and ducc.agent.node.metrics.publish.rate?  The
>> Web Server considers a node down according to the following
>> calculation:
>>
>> private long getAgentMillisMIA() {
>>          String location = "getAgentMillisMIA";
>>          long secondsMIA = DOWN_AFTER_SECONDS*SECONDS_PER_MILLI;
>>          Properties properties = DuccWebProperties.get();
>>          String s_tolerance = properties.getProperty("ducc.
>> rm.node.stability");
>>          String s_rate =
>> properties.getProperty("ducc.agent.node.metrics.publish.rate");
>>          try {
>>              long tolerance = Long.parseLong(s_tolerance.trim());
>>              long rate = Long.parseLong(s_rate.trim());
>>              secondsMIA = (tolerance * rate) / 1000;
>>          }
>>          catch(Throwable t) {
>>              logger.warn(location, jobid, t);
>>          }
>>          return secondsMIA;
>>      }
>>
>> The default is 65 seconds. Note that the Web Server has no effect on
>> actual operations in this case.  If is just a reporter of information.
>>
>> Lou.
>>
>> On Wed, Nov 12, 2014 at 12:45 AM, reshu.agarwal
>> <reshu.agarwal@orkash.com> wrote:
>>
>>> Hi,
>>>
>>> When I was trying DUCC-1.1.0 on multi machine, I have faced an up-down
>>> status problem in machines. I have configured two machines and these
>>> machines are going down one by one. This makes the DUCC Services disable
>>> and
>>> Jobs to be initialize again and again.
>>>
>>> DUCC 1.0.0 was working fine on same machines.
>>>
>>> How can I fix this problem? I have also compared ducc.properties file for
>>> both versions. Both are using same configuration to check heartbeats.
>>>
>>> Re-Initialization of Jobs are increasing the processing time. Can I
>>> change
>>> or re-configure this process?
>>>
>>> Services are getting disabled automatically and showing excessive
>>> Initialization error status on mark over on disabled status but logs are
>>> not
>>> showing any error.
>>>
>>> I have to use DUCC 1.0.0 instead of DUCC 1.1.0.
>>>
>>> Thanks in Advance.
>>>
>>> --
>>> Signature *Reshu Agarwal*
>>>
>>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message