Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 64BA8200C04 for ; Tue, 10 Jan 2017 01:58:02 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 633C9160B49; Tue, 10 Jan 2017 00:58:02 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id B467C160B3E for ; Tue, 10 Jan 2017 01:58:01 +0100 (CET) Received: (qmail 51257 invoked by uid 500); 10 Jan 2017 00:58:01 -0000 Mailing-List: contact commits-help@beam.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@beam.apache.org Delivered-To: mailing list commits@beam.apache.org Received: (qmail 51248 invoked by uid 99); 10 Jan 2017 00:58:00 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 10 Jan 2017 00:58:00 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 81AA61804C1 for ; Tue, 10 Jan 2017 00:58:00 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -7.019 X-Spam-Level: X-Spam-Status: No, score=-7.019 tagged_above=-999 required=6.31 tests=[KAM_LAZY_DOMAIN_SECURITY=1, RCVD_IN_DNSWL_HI=-5, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RP_MATCHES_RCVD=-2.999] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id DFc9-aFipC_N for ; Tue, 10 Jan 2017 00:58:00 +0000 (UTC) Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with SMTP id 831F25F370 for ; Tue, 10 Jan 2017 00:57:59 +0000 (UTC) Received: (qmail 51229 invoked by uid 99); 10 Jan 2017 00:57:58 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 10 Jan 2017 00:57:58 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id ADF272C0453 for ; Tue, 10 Jan 2017 00:57:58 +0000 (UTC) Date: Tue, 10 Jan 2017 00:57:58 +0000 (UTC) From: "Pei He (JIRA)" To: commits@beam.incubator.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Created] (BEAM-1252) BigQueryIO.Read: validate exported files with GCS glob. MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Tue, 10 Jan 2017 00:58:02 -0000 Pei He created BEAM-1252: ---------------------------- Summary: BigQueryIO.Read: validate exported files with GCS glob. Key: BEAM-1252 URL: https://issues.apache.org/jira/browse/BEAM-1252 Project: Beam Issue Type: Bug Components: sdk-java-gcp Reporter: Pei He Assignee: Pei He BigQuery has started creating user-visible temp files that we notice and start reading from, but then they get moved. It could cause job failures and data duplication. On Beam side, we can have stronger validation: 1. When listing files, validate that they match the expected URI. 2. When BQ has finished job, integrity check to verify that # files read from == # files BQ claims exist. 3. If possible, add a prefix to the filename of the glob (*.avro to step*.avro). Step name? Other? This might be as easy as dropping a '/' in the middle of the path. A la #7. -- This message was sent by Atlassian JIRA (v6.3.4#6332)