Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 26E49200ACC for ; Mon, 2 May 2016 17:44:17 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 25BAF1609B0; Mon, 2 May 2016 17:44:17 +0200 (CEST) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 6EA771609A6 for ; Mon, 2 May 2016 17:44:16 +0200 (CEST) Received: (qmail 33541 invoked by uid 500); 2 May 2016 15:44:15 -0000 Mailing-List: contact commits-help@airflow.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@airflow.incubator.apache.org Delivered-To: mailing list commits@airflow.incubator.apache.org Received: (qmail 33532 invoked by uid 99); 2 May 2016 15:44:15 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 02 May 2016 15:44:15 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id A1894C04B9 for ; Mon, 2 May 2016 15:44:14 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -3.221 X-Spam-Level: X-Spam-Status: No, score=-3.221 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, KAM_LAZY_DOMAIN_SECURITY=1, RCVD_IN_DNSWL_HI=-5, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RP_MATCHES_RCVD=-0.001] autolearn=disabled Received: from mx2-lw-us.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id jg_tDA9r-1FN for ; Mon, 2 May 2016 15:44:13 +0000 (UTC) Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx2-lw-us.apache.org (ASF Mail Server at mx2-lw-us.apache.org) with SMTP id BAE075F60E for ; Mon, 2 May 2016 15:44:13 +0000 (UTC) Received: (qmail 33431 invoked by uid 99); 2 May 2016 15:44:12 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 02 May 2016 15:44:12 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id CD8A52C1F5C for ; Mon, 2 May 2016 15:44:12 +0000 (UTC) Date: Mon, 2 May 2016 15:44:12 +0000 (UTC) From: "Chris Riccomini (JIRA)" To: commits@airflow.incubator.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (AIRFLOW-30) Make preoperators part of the same transaction as the actual operation MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Mon, 02 May 2016 15:44:17 -0000 [ https://issues.apache.org/jira/browse/AIRFLOW-30?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15266837#comment-15266837 ] Chris Riccomini commented on AIRFLOW-30: ---------------------------------------- Initially, this seemed fine to me, but it actually sounds a little tricky to me. The concern I have is what happens if the DAG fails half way through and the transaction is reverted? In such a case, Airflow will show prior operators as having successfully run, but their state mutation will not have taken place. This effectively means that the entire DAG needs to be re-run, even if some of the operators show a successfully executed view. > Make preoperators part of the same transaction as the actual operation > ---------------------------------------------------------------------- > > Key: AIRFLOW-30 > URL: https://issues.apache.org/jira/browse/AIRFLOW-30 > Project: Apache Airflow > Issue Type: Improvement > Reporter: Bence Nagy > > All my use cases would work better if each operator would execute everything in one transaction. Two examples: > - I want to {{GenericTransfer}} a set of rows from one DB to another, and I have to create the table first in the destination DB. I feel like it'd be a lot more clean if I didn't have empty tables lying around if the insertion fails for some reason later on. > - I want to {{GenericTransfer}} all rows from an entire table periodically to sync it from one DB to another. To do this correctly I want to clear the destination table first to make sure I end up with no duplicate rows, so I'd have a {{DELETE * FROM dst_table}} preoperator. If the insertions fail afterwards, I'd end up with no data (it would be better in most cases to fall back to the old data), and even if everything is working correctly, I'll have an empty table while the insertions as still executing. > To fix this, the relevant {{DbApiHook}} methods could support a new kwarg to set whether it should commit at the end. > Thoughts? -- This message was sent by Atlassian JIRA (v6.3.4#6332)