Return-Path: X-Original-To: apmail-falcon-dev-archive@minotaur.apache.org Delivered-To: apmail-falcon-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 73F3A11C3D for ; Mon, 25 Aug 2014 05:40:19 +0000 (UTC) Received: (qmail 85546 invoked by uid 500); 25 Aug 2014 05:40:19 -0000 Delivered-To: apmail-falcon-dev-archive@falcon.apache.org Received: (qmail 85511 invoked by uid 500); 25 Aug 2014 05:40:19 -0000 Mailing-List: contact dev-help@falcon.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@falcon.incubator.apache.org Delivered-To: mailing list dev@falcon.incubator.apache.org Received: (qmail 85500 invoked by uid 99); 25 Aug 2014 05:40:19 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 25 Aug 2014 05:40:19 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.3] (HELO mail.apache.org) (140.211.11.3) by apache.org (qpsmtpd/0.29) with SMTP; Mon, 25 Aug 2014 05:40:18 +0000 Received: (qmail 84221 invoked by uid 99); 25 Aug 2014 05:39:58 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 25 Aug 2014 05:39:58 +0000 Date: Mon, 25 Aug 2014 05:39:57 +0000 (UTC) From: "Shwetha G S (JIRA)" To: dev@falcon.incubator.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (FALCON-630) late data rerun for process broken in trunk MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/FALCON-630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14108764#comment-14108764 ] Shwetha G S commented on FALCON-630: ------------------------------------ {quote} Why do you need this? How is this different from feedNames? This same property can be overloaded with input names in the process and feed names in replication, no? {quote} Input name is different from feed name and falcon has validation that input names are unique, but not input feed names. This is useful for a lot of pipelines where data from different instances are handled differently. For example, the de-duping of events across 2 hours is done by taking (n-1)th and (n)th hour data for the same input feed. For all the events in (n-1)th hour, the data is de-duped against (n)th hour events. This process will need to define 2 inputs for the same feed. This is the reason that late data is defined on input names, rather than on input feeds in process. Currently, late data for process is broken as the workflow param will have feed names, but late data section of process has input names. So, the comparison in code is wrong. > late data rerun for process broken in trunk > -------------------------------------------- > > Key: FALCON-630 > URL: https://issues.apache.org/jira/browse/FALCON-630 > Project: Falcon > Issue Type: Bug > Components: rerun > Affects Versions: 0.5 > Reporter: Samarth Gupta > Assignee: Shwetha G S > Priority: Blocker > Fix For: 0.4 > > Attachments: FALCON-630.patch > > > late data rerun for process is not working . it seems like in pre processing record size is storing data by Feed name and not by input name , due to which late data is never detected. > {code} > -falconInputFeeds > FETL2-RRLog#FETL-RTBS-PRLog#FETL-RTBS-NPRLog > {code} > above even though param in tasktracker logs says InputFeeds , they are actually feed name. -- This message was sent by Atlassian JIRA (v6.2#6252)