Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id B3F57200B8B for ; Tue, 4 Oct 2016 20:45:30 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id B16D0160ACC; Tue, 4 Oct 2016 18:45:30 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 024EB160AC7 for ; Tue, 4 Oct 2016 20:45:29 +0200 (CEST) Received: (qmail 15513 invoked by uid 500); 4 Oct 2016 18:45:24 -0000 Mailing-List: contact dev-help@apex.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@apex.apache.org Delivered-To: mailing list dev@apex.apache.org Received: (qmail 15502 invoked by uid 99); 4 Oct 2016 18:45:24 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 04 Oct 2016 18:45:24 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id A7605C1D24 for ; Tue, 4 Oct 2016 18:45:23 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -6.219 X-Spam-Level: X-Spam-Status: No, score=-6.219 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, KAM_LAZY_DOMAIN_SECURITY=1, RCVD_IN_DNSWL_HI=-5, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RP_MATCHES_RCVD=-2.999] autolearn=disabled Received: from mx2-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id Y0KgcLS3kooz for ; Tue, 4 Oct 2016 18:45:22 +0000 (UTC) Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx2-lw-eu.apache.org (ASF Mail Server at mx2-lw-eu.apache.org) with SMTP id D28115F4ED for ; Tue, 4 Oct 2016 18:45:21 +0000 (UTC) Received: (qmail 14713 invoked by uid 99); 4 Oct 2016 18:45:20 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 04 Oct 2016 18:45:20 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id B530D2C2A62 for ; Tue, 4 Oct 2016 18:45:20 +0000 (UTC) Date: Tue, 4 Oct 2016 18:45:20 +0000 (UTC) From: "Munagala V. Ramanath (JIRA)" To: dev@apex.incubator.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Comment Edited] (APEXMALHAR-2254) File input operator is not idempotent with closing files on replay MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Tue, 04 Oct 2016 18:45:30 -0000 [ https://issues.apache.org/jira/browse/APEXMALHAR-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15546182#comment-15546182 ] Munagala V. Ramanath edited comment on APEXMALHAR-2254 at 10/4/16 6:45 PM: --------------------------------------------------------------------------- It would be useful to define what we mean for this operator (AbstractFileInputOperator) to be idempotent: It doesn't quite work to define it as "same tuples output in the same windows, given same input" because this operator is abstract, has no ports and emits no tuples. Concrete subclasses are expected to implement the emit() method to actually emit tuples. So a definition that involves calls to emit(), openFile(), closeFile() may be more meaningful. Also, clarifying what is expected of subclasses in order to maintain idempotency will also be useful. Finally, there are several other JIRAs related to this class and it may be useful to keep them in mind too. was (Author: dtram): It would be useful to define what we mean for this operator (AbstractFileInputOperator) to be idempotent: It doesn't quite to define it as "same tuples output in the same windows, given same input" because this operator is abstract, has no ports and emits no tuples. Concrete subclasses are expected to implement the emit() method to actually emit tuples. So a definition that involves calls to emit(), openFile(), closeFile() may be more meaningful. Also, clarifying what is expected of subclasses in order to maintain idempotency will also be useful. Finally, there are several other JIRAs related to this class and it may be useful to keep them in mind too. > File input operator is not idempotent with closing files on replay > ------------------------------------------------------------------ > > Key: APEXMALHAR-2254 > URL: https://issues.apache.org/jira/browse/APEXMALHAR-2254 > Project: Apache Apex Malhar > Issue Type: Bug > Reporter: Pramod Immaneni > Assignee: Pramod Immaneni > > With the file input operator, on a replay in a failure scenario, the same data is output as before the failure, for every window that is being replayed after checkpoint. To do this the operator keeps track of the files and offsets for every window and replays the data based on that. > However, if it so happens that before the failure the processing of a file was finished and it was closed exactly before the end window and the next file was opened and processed in a new window, in the replay the closing of the first file does not happen in earlier window but happens in the latter window. This can cause problems if an operator depends on the closing file also to happen in an idempotent manner. > Improve the operator to save the closing and opening of files in the idempotent state as well so that it can also happen in an idempotent manner. -- This message was sent by Atlassian JIRA (v6.3.4#6332)