apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (APEXCORE-415) Input Operator Can Double Checkpoint if Operator CheckpointWindowCount is greater than DAG CheckpointWindowCount
Date Mon, 04 Apr 2016 18:11:25 GMT

    [ https://issues.apache.org/jira/browse/APEXCORE-415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15224689#comment-15224689
] 

ASF GitHub Bot commented on APEXCORE-415:
-----------------------------------------

Github user ilooner commented on the pull request:

    https://github.com/apache/incubator-apex-core/pull/292#issuecomment-205426336
  
    @davidyan74 There was a similar bug in Generic Node which I fixed and got merged before.
The problem happens when The Operator CheckpointWindowCount is a multiple of the Dag CheckpointWindowCount.
In that case the operator is checkpointed on receiving the endWindow tuple and the checkpoint
tuple for the same window. When that happens AsyncFSStorageAgent throws an exception because
there are two threads trying to move the same file.


> Input Operator Can Double Checkpoint if Operator CheckpointWindowCount is greater than
DAG CheckpointWindowCount
> ----------------------------------------------------------------------------------------------------------------
>
>                 Key: APEXCORE-415
>                 URL: https://issues.apache.org/jira/browse/APEXCORE-415
>             Project: Apache Apex Core
>          Issue Type: Bug
>            Reporter: Timothy Farkas
>            Assignee: Timothy Farkas
>
> Application that reproduces the issue is here
> https://github.com/ilooner/streamcodec-bug/tree/asyncCheckpointBug
> java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.lang.RuntimeException:
java.io.FileNotFoundException: /disk6/ndevyarn/nm/usercache/tim/appcache/application_1456485348783_3429/container_1456485348783_3429_01_000019/tmp/chkp3241218411712328004/1/6268662011559673861
(No such file or directory)
> 	at com.datatorrent.netlet.util.DTThrowable.wrapIfChecked(DTThrowable.java:59)
> 	at com.datatorrent.stram.engine.Node.reportStats(Node.java:465)
> 	at com.datatorrent.stram.engine.InputNode.run(InputNode.java:156)
> 	at com.datatorrent.stram.engine.StreamingContainer$2.run(StreamingContainer.java:1388)
> Caused by: java.util.concurrent.ExecutionException: java.lang.RuntimeException: java.io.FileNotFoundException:
/disk6/ndevyarn/nm/usercache/tim/appcache/application_1456485348783_3429/container_1456485348783_3429_01_000019/tmp/chkp3241218411712328004/1/6268662011559673861
(No such file or directory)
> 	at java.util.concurrent.FutureTask.report(FutureTask.java:122)
> 	at java.util.concurrent.FutureTask.get(FutureTask.java:188)
> 	at com.datatorrent.stram.engine.Node.reportStats(Node.java:458)
> 	... 2 more
> Caused by: java.lang.RuntimeException: java.io.FileNotFoundException: /disk6/ndevyarn/nm/usercache/tim/appcache/application_1456485348783_3429/container_1456485348783_3429_01_000019/tmp/chkp3241218411712328004/1/6268662011559673861
(No such file or directory)
> 	at com.datatorrent.netlet.util.DTThrowable.wrapIfChecked(DTThrowable.java:50)
> 	at com.datatorrent.netlet.util.DTThrowable.rethrow(DTThrowable.java:31)
> 	at com.datatorrent.common.util.AsyncFSStorageAgent.copyToHDFS(AsyncFSStorageAgent.java:126)
> 	at com.datatorrent.stram.engine.Node$CheckpointHandler.call(Node.java:684)
> 	at com.datatorrent.stram.engine.Node$CheckpointHandler.call(Node.java:673)
> 	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> 	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> 	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> 	at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.FileNotFoundException: /disk6/ndevyarn/nm/usercache/tim/appcache/application_1456485348783_3429/container_1456485348783_3429_01_000019/tmp/chkp3241218411712328004/1/6268662011559673861
(No such file or directory)
> 	at java.io.FileInputStream.open(Native Method)
> 	at java.io.FileInputStream.<init>(FileInputStream.java:146)
> 	at com.datatorrent.common.util.AsyncFSStorageAgent.copyToHDFS(AsyncFSStorageAgent.java:117)
> 	... 8 more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message