edgent-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dale LaBossiere (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (EDGENT-382) A RuntimeException thrown while processing a tuple brings down the whole topology
Date Tue, 14 Mar 2017 21:46:41 GMT

    [ https://issues.apache.org/jira/browse/EDGENT-382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15893024#comment-15893024
] 

Dale LaBossiere edited comment on EDGENT-382 at 3/14/17 9:46 PM:
-----------------------------------------------------------------

So at a high level the intent was that {{JobMonitorApp}} was to be the resiliency answer?

Some questions come to mind:
- What was the thinking wrt uniformity of behavior among the providers? Is its absence from
DirectProvider just an omission?
- What was the thinking wrt uniformity of behavior within an app wrt such exceptions?  i.e.,
if any "application supplied function" throws, should the job get restarted?  (assuming the
Isolate and Barrier items noted above were addressed)
- What was the thinking wrt additional alternative behaviors such as a less heavy handed "just
log and continue"?

[update] I've come to appreciate that failed-job-restart support via JobMonitorApp can't be
transparently added to DirectProvider.  If it were present, users would have to change the
way they're building their topologies in order to leverage it.  That's because JobMonitorApp
requires that a topology builder be registered with an ApplicationService and DirectProvider
can't synthesize one for them.  Note most samples perform all topology construction in main().


was (Author: dlaboss):
So at a high level the intent was that {{JobMonitorApp}} was to be the resiliency answer?

Some questions come to mind:
- What was the thinking wrt uniformity of behavior among the providers? Is its absence from
DirectProvider just an omission?
- What was the thinking wrt uniformity of behavior within an app wrt such exceptions?  i.e.,
if any "application supplied function" throws, should the job get restarted?  (assuming the
Isolate and Barrier items noted above were addressed)
- What was the thinking wrt additional alternative behaviors such as a less heavy handed "just
log and continue"?


> A RuntimeException thrown while processing a tuple brings down the whole topology
> ---------------------------------------------------------------------------------
>
>                 Key: EDGENT-382
>                 URL: https://issues.apache.org/jira/browse/EDGENT-382
>             Project: Edgent
>          Issue Type: Bug
>          Components: Runtime
>            Reporter: Dale LaBossiere
>         Attachments: DlabossExceptionTest.java
>
>
> I encountered the above in the context of the WIoTP connector, and
> there may be a problem there as well, but it’s trivial to demonstrate the
> problem in a more general context.
> i.e., a RuntimeException thrown from a Topology.poll(), generate(), source() or from
an unisolated user function implementation downstream of the source, like a map() or sink()'s
function, causes the topology to immediately terminate.  That typically causes the process
to terminate.
> It's unclear to me which parts of the runtime should be doing what with respect to this.
> Things need to be more resilient in the face of transient errors, particularly wrt transient
connector problems.  As an example MqttPublisher.accept() achieved resiliency in the face
of transient connection problems by logging instead of throwing.  IotpDevice connector just
throws... which at a certain level is OK/desirable... if the runtime were to handle resiliency
issues.
> Note, a RuntimeException from a Topology.events() supplier or even a downstream function
doesn't result in topology termination.  That's because the runtime thread blocking awaiting
the next supplied tuple doesn't see the RuntimeException.  And for the downstream case, the
stream is Isolated so again the runtime thread doesn't see the exception.  That said, the
thread internal to Isolate silently terminates in the face of a downstream exception.  ugh.
 (Barrier looks to have a similar problem).
> There needs to be some clear / prominent doc on all of this, what the design / behavior
is supposed to be, and then we can address any issues in the light of that understanding.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message