apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thomas Weise (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (APEXCORE-703) Window processing timeout for finished/undeployed container
Date Sun, 16 Apr 2017 04:33:42 GMT

    [ https://issues.apache.org/jira/browse/APEXCORE-703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15970234#comment-15970234
] 

Thomas Weise commented on APEXCORE-703:
---------------------------------------

Vlad, thanks for checking. I don't think we can set the state to INACTIVE because the operator
needs to be recovered if there is a failure prior to reaching the checkpoint barrier. We would
have to check shutdownOperators before marking blocked?

> Window processing timeout for finished/undeployed container
> -----------------------------------------------------------
>
>                 Key: APEXCORE-703
>                 URL: https://issues.apache.org/jira/browse/APEXCORE-703
>             Project: Apache Apex Core
>          Issue Type: Bug
>    Affects Versions: 3.5.0
>            Reporter: Daniel Halperin
>
> Using Apex 3.5.0 with Apache Beam, I have a 10-container pipeline. The first container,
id #1, finishes and gets undeployed at 12:41:10 PM.
> Then, 60s later (at 12:42:10 PM), Apex decides that container is blocked because no data
has been received for 60s, declares failure, and restarts it.
> This would seem to be a bug -- shouldn't finished and undeployed operators be deregistered
from the timeout logic that is detecting stuck operators?
> Log below
> {code}
> Apr 14, 2017 12:41:10 PM com.datatorrent.stram.engine.StreamingContainer processHeartbeatResponse
> INFO: Undeploy request: [1]
> Apr 14, 2017 12:41:10 PM com.datatorrent.stram.engine.StreamingContainer undeploy
> INFO: Undeploy complete.
> Apr 14, 2017 12:42:10 PM com.datatorrent.stram.StreamingContainerManager updateRecoveryCheckpoints
> WARNING: Marking operator PTOperator[id=1,name=TextIO.Read/Read] blocked committed window
ffffffffffffffff, recovery window ffffffffffffffff, current time 1492198930012, last window
id change time 1492198869957, window processing timeout millis 60000
> Apr 14, 2017 12:42:10 PM com.datatorrent.stram.StreamingContainerManager updateCheckpoints
> INFO: Blocked operator PTOperator[id=1,name=TextIO.Read/Read] container PTContainer[id=1(container-6),state=ACTIVE]
time 60055ms
> Apr 14, 2017 12:42:11 PM com.datatorrent.stram.engine.StreamingContainer processHeartbeatResponse
> INFO: Received shutdown request
> Apr 14, 2017 12:42:11 PM com.datatorrent.stram.StramLocalCluster run
> INFO: Container container-6 restart.
> Apr 14, 2017 12:42:11 PM com.datatorrent.stram.StreamingContainerManager scheduleContainerRestart
> INFO: Initiating recovery for container-6@localhost
> Apr 14, 2017 12:42:11 PM com.datatorrent.stram.StreamingContainerManager updateRecoveryCheckpoints
> WARNING: Marking operator PTOperator[id=1,name=TextIO.Read/Read] blocked committed window
ffffffffffffffff, recovery window ffffffffffffffff, current time 1492198931015, last window
id change time 1492198869957, window processing timeout millis 60000
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message