edgent-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (QUARKS-66) Job monitoring application which restarts failed jobs
Date Fri, 01 Apr 2016 16:18:25 GMT

    [ https://issues.apache.org/jira/browse/QUARKS-66?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15221911#comment-15221911

ASF GitHub Bot commented on QUARKS-66:

Github user dlaboss commented on a diff in the pull request:

    --- Diff: api/topology/src/main/java/quarks/topology/services/ApplicationService.java
    @@ -63,4 +65,11 @@ Licensed to the Apache Software Foundation (ASF) under one
          * @see ApplicationServiceMXBean
         void registerTopology(String applicationName, BiConsumer<Topology, JsonObject>
    --- End diff --
    Above for `registerTopology()` what are the requirements for the appName?  What happens
if one by that name is already registered?  By definition elsewhere, is a Job's appName already
required to be unique? Regardless, seems like it could help to add doc here to clarify things.
    I'm also wondering about this "register with appName prior to submit" model vs say "register
the *Job* following the submit".  A post submit registration scheme seems to enable leaving
it to the system/provider-impl to decide what to use as an identifier to find the topology-builder
to rebuild/resubmit the job.   It also feels more logical to express "I want this Job monitored"
rather than "I want a/all jobs with this appName monitored"... though maybe that's just me.
 Does the pre-submit scheme handle recovery from certain startup failures that the post-submit
scheme can't?
    Is there also a need for an unregisterTopology() or is it just that an explicitly cancelled
job is effectively automatically unregistered?

> Job monitoring application which restarts failed jobs
> -----------------------------------------------------
>                 Key: QUARKS-66
>                 URL: https://issues.apache.org/jira/browse/QUARKS-66
>             Project: Quarks
>          Issue Type: Task
>            Reporter: Victor Dogaru
>            Assignee: Victor Dogaru
>              Labels: failure-recovery
> An application which filters job events indicating jobs which closed with an unhealthy
state and resubmits applications associated with those jobs.

This message was sent by Atlassian JIRA

View raw message