Return-Path: X-Original-To: apmail-beam-commits-archive@minotaur.apache.org Delivered-To: apmail-beam-commits-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id AD293199AD for ; Mon, 11 Apr 2016 17:02:28 +0000 (UTC) Received: (qmail 7263 invoked by uid 500); 11 Apr 2016 17:02:28 -0000 Delivered-To: apmail-beam-commits-archive@beam.apache.org Received: (qmail 7222 invoked by uid 500); 11 Apr 2016 17:02:28 -0000 Mailing-List: contact commits-help@beam.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@beam.incubator.apache.org Delivered-To: mailing list commits@beam.incubator.apache.org Received: (qmail 7213 invoked by uid 99); 11 Apr 2016 17:02:28 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 11 Apr 2016 17:02:28 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 39B4B180234 for ; Mon, 11 Apr 2016 17:02:28 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -4.021 X-Spam-Level: X-Spam-Status: No, score=-4.021 tagged_above=-999 required=6.31 tests=[KAM_LAZY_DOMAIN_SECURITY=1, RCVD_IN_DNSWL_HI=-5, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RP_MATCHES_RCVD=-0.001] autolearn=disabled Received: from mx2-lw-us.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id czsz9DZHCF7S for ; Mon, 11 Apr 2016 17:02:27 +0000 (UTC) Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx2-lw-us.apache.org (ASF Mail Server at mx2-lw-us.apache.org) with SMTP id 1D9E75F1B3 for ; Mon, 11 Apr 2016 17:02:26 +0000 (UTC) Received: (qmail 5752 invoked by uid 99); 11 Apr 2016 17:02:26 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 11 Apr 2016 17:02:26 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id BD37B2C1F72 for ; Mon, 11 Apr 2016 17:02:25 +0000 (UTC) Date: Mon, 11 Apr 2016 17:02:25 +0000 (UTC) From: "Davor Bonaci (JIRA)" To: commits@beam.incubator.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (BEAM-167) TextIO can't read concatenated gzip files MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/BEAM-167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Davor Bonaci updated BEAM-167: ------------------------------ Component/s: (was: sdk-java-extensions) sdk-java-core > TextIO can't read concatenated gzip files > ----------------------------------------- > > Key: BEAM-167 > URL: https://issues.apache.org/jira/browse/BEAM-167 > Project: Beam > Issue Type: Bug > Components: sdk-java-core > Reporter: Eugene Kirpichov > Assignee: Luke Cwik > > $ cat < header.csv > a,b,c > END > $ cat < body.csv > 1,2,3 > 4,5,6 > 7,8,9 > END > $ gzip -c header.csv > file.gz > $ gzip -c body.csv >> file.gz > The file is well-formed: > $ gzip -dc file.gz > a,b,c > 1,2,3 > 4,5,6 > 7,8,9 > However, TextIO.Read.from("/path/to/file.gz") will read only "a,b,c" - reproducible even when the file is on local disk and with the DirectPipelineRunner. > The bug is in CompressedSource. It uses GzipCompressorInputStream, which by default reads only the first gzip stream in the file, but has an option to read all of them. Previously (in Dataflow SDK 1.4.0) we used GZIPInputStream which reads all streams. -- This message was sent by Atlassian JIRA (v6.3.4#6332)