apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Munagala V. Ramanath (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (APEXMALHAR-2254) File input operator is not idempotent with closing files on replay
Date Tue, 04 Oct 2016 20:32:21 GMT

    [ https://issues.apache.org/jira/browse/APEXMALHAR-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15546572#comment-15546572

Munagala V. Ramanath commented on APEXMALHAR-2254:

Here is a list of other JIRAs related to this operator:
APEXMALHAR-2250 AbstractFileInputOperator.DirectoryScanner does not handle directories correctly.
APEXMALHAR-2270 AbstractFileInputOperator: During replay, inputStream should skip tuples
APEXMALHAR-2269 AbstractFileInputOperator: During replay, IO errors not handled
APEXMALHAR-2263 Offsets in AbstractFileInputOperator should be long rather than int
APEXMALHAR-2021 Add property to AbstractFileInputOperator to trim processedFiles and ignoredFiles
APEXMALHAR-2268 AbstractFileInputOperator: During replay, readEntity may be called without
calling openFile.
APEXMALHAR-2274 AbstractFileInputOperator gets killed when there are a large number of files.

> File input operator is not idempotent with closing files on replay
> ------------------------------------------------------------------
>                 Key: APEXMALHAR-2254
>                 URL: https://issues.apache.org/jira/browse/APEXMALHAR-2254
>             Project: Apache Apex Malhar
>          Issue Type: Bug
>            Reporter: Pramod Immaneni
>            Assignee: Pramod Immaneni
> With the file input operator, on a replay in a failure scenario, the same data is output
as before the failure, for every window that is being replayed after checkpoint. To do this
the operator keeps track of the files and offsets for every window and replays the data based
on that. 
> However, if it so happens that before the failure the processing of a file was finished
and it was closed exactly before the end window and the next file was opened and processed
in a new window, in the replay the closing of the first file does not happen in earlier window
but happens in the latter window. This can cause problems if an operator depends on the closing
file also to happen in an idempotent manner.
> Improve the operator to save the closing and opening of files in the idempotent state
as well so that it can also happen in an idempotent manner.

This message was sent by Atlassian JIRA

View raw message