Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 57FFD200B8E for ; Mon, 26 Sep 2016 22:28:26 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 56B82160ACA; Mon, 26 Sep 2016 20:28:26 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 8C219160AC8 for ; Mon, 26 Sep 2016 22:28:25 +0200 (CEST) Received: (qmail 8983 invoked by uid 500); 26 Sep 2016 20:28:24 -0000 Mailing-List: contact commits-help@beam.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@beam.incubator.apache.org Delivered-To: mailing list commits@beam.incubator.apache.org Received: (qmail 8974 invoked by uid 99); 26 Sep 2016 20:28:24 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 26 Sep 2016 20:28:24 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 6A7DC1804C1 for ; Mon, 26 Sep 2016 20:28:24 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -5.446 X-Spam-Level: X-Spam-Status: No, score=-5.446 tagged_above=-999 required=6.31 tests=[KAM_LAZY_DOMAIN_SECURITY=1, RCVD_IN_DNSWL_HI=-5, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RP_MATCHES_RCVD=-1.426] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id 26agSMBaM2zo for ; Mon, 26 Sep 2016 20:28:22 +0000 (UTC) Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with SMTP id 550825FE69 for ; Mon, 26 Sep 2016 20:28:21 +0000 (UTC) Received: (qmail 8832 invoked by uid 99); 26 Sep 2016 20:28:20 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 26 Sep 2016 20:28:20 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 7651B2C0B05 for ; Mon, 26 Sep 2016 20:28:20 +0000 (UTC) Date: Mon, 26 Sep 2016 20:28:20 +0000 (UTC) From: "Scott Wegner (JIRA)" To: commits@beam.incubator.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Created] (BEAM-680) Python Dataflow stages stale requirements-cache dependencies MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Mon, 26 Sep 2016 20:28:26 -0000 Scott Wegner created BEAM-680: --------------------------------- Summary: Python Dataflow stages stale requirements-cache dependencies Key: BEAM-680 URL: https://issues.apache.org/jira/browse/BEAM-680 Project: Beam Issue Type: Bug Components: sdk-py Reporter: Scott Wegner Priority: Minor When executing a python pipeline using a requirements.txt file, the Dataflow runner will stage all dependencies downloaded to its requirements cache directory, including those specified in the requirements.txt, and any previously cached dependencies. This results in bloated staging directory if previous pipeline runs from the same machine included different dependencies. Repro: # Initialize a virtualenv and pip install apache_beam # Create an empty requirements.txt file # Create a simple pipeline using DataflowPipelineRunner and a requirements.txt file, for example: [my_pipeline.py|https://gist.github.com/swegner/6df00df1423b48206c4ab5a7e917218a] # {{touch /tmp/dataflow-requirements-cache/extra-file.txt}} # Run the pipeline with a specified staging directory # Check the staged files for the job 'extra-file.txt' will be uploaded with the job, along with any other cached dependencies under /tmp/dataflow-requirements-cache. We should only be staging the dependencies necessary for a pipeline, not all previously-cached dependencies found on the machine. -- This message was sent by Atlassian JIRA (v6.3.4#6332)