Return-Path: X-Original-To: apmail-falcon-dev-archive@minotaur.apache.org Delivered-To: apmail-falcon-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A14AF18B78 for ; Tue, 22 Dec 2015 10:59:49 +0000 (UTC) Received: (qmail 33171 invoked by uid 500); 22 Dec 2015 10:59:49 -0000 Delivered-To: apmail-falcon-dev-archive@falcon.apache.org Received: (qmail 33135 invoked by uid 500); 22 Dec 2015 10:59:49 -0000 Mailing-List: contact dev-help@falcon.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@falcon.apache.org Delivered-To: mailing list dev@falcon.apache.org Received: (qmail 33124 invoked by uid 99); 22 Dec 2015 10:59:49 -0000 Received: from Unknown (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 22 Dec 2015 10:59:49 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id E5E441A024A for ; Tue, 22 Dec 2015 10:59:48 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.426 X-Spam-Level: X-Spam-Status: No, score=0.426 tagged_above=-999 required=6.31 tests=[KAM_LAZY_DOMAIN_SECURITY=1, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RP_MATCHES_RCVD=-0.554] autolearn=disabled Received: from mx1-us-west.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id ZjwQIXLM_j2u for ; Tue, 22 Dec 2015 10:59:47 +0000 (UTC) Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx1-us-west.apache.org (ASF Mail Server at mx1-us-west.apache.org) with SMTP id 81E7320F86 for ; Tue, 22 Dec 2015 10:59:47 +0000 (UTC) Received: (qmail 32701 invoked by uid 99); 22 Dec 2015 10:59:47 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 22 Dec 2015 10:59:47 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id C27952C1F62 for ; Tue, 22 Dec 2015 10:59:46 +0000 (UTC) Date: Tue, 22 Dec 2015 10:59:46 +0000 (UTC) From: "Srikanth Sundarrajan (JIRA)" To: dev@falcon.incubator.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (FALCON-1686) Support for reprocessing MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/FALCON-1686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15067956#comment-15067956 ] Srikanth Sundarrajan commented on FALCON-1686: ---------------------------------------------- This is the classic backfill issue. Not sure if the effective time would solve it. I would think that the effective time feature would require the time to be between the start time and end time of the process definition. We have had this issue as well in the past and solution we had used was the same as suggested by [~massdosage@gmail.com] and isn't ideal. Unless there is a hole in the data, simple update of the process with a earlier start date should be supported. > Support for reprocessing > ------------------------ > > Key: FALCON-1686 > URL: https://issues.apache.org/jira/browse/FALCON-1686 > Project: Falcon > Issue Type: Improvement > Affects Versions: 0.7 > Reporter: Mass Dosage > > We have a number of ETL jobs which we schedule to run on a regular basis with Falcon. This works fine. However, we often have cases where we need to run the exact same jobs over past date ranges in order to reprocess data after a code change. There doesn't seem to be any easy way to do this in Falcon at the moment. Ideally we'd have a controlled way of saying "run this process for dates between X and Y". There should also be a way to control whether downstream processes are triggered by the data being reprocessed or not. In some cases you may want downstream jobs to also run on the new data but in other cases you might not. > With Oozie, if one wants to reprocess data from any time in history, one can update the start & end-dates (using the job.properties file) and submit a new coordinator to run alongside the existing one. As the coordinator-ids are unique they do not clash. In Falcon, processes are defined by their readable name so one would need to update that in the process file directly. > We are currently working around this issue by making a copy of the original Falcon process, giving it a different name and changing the dates. This isn't ideal and leads to a lot of XML duplication. -- This message was sent by Atlassian JIRA (v6.3.4#6332)