Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 4A410200D0C for ; Wed, 20 Sep 2017 08:31:06 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 479211609E2; Wed, 20 Sep 2017 06:31:06 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 94DF51609E1 for ; Wed, 20 Sep 2017 08:31:05 +0200 (CEST) Received: (qmail 15677 invoked by uid 500); 20 Sep 2017 06:31:04 -0000 Mailing-List: contact commits-help@beam.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@beam.apache.org Delivered-To: mailing list commits@beam.apache.org Received: (qmail 15663 invoked by uid 99); 20 Sep 2017 06:31:04 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 20 Sep 2017 06:31:04 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 5723F183F12 for ; Wed, 20 Sep 2017 06:31:04 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -99.202 X-Spam-Level: X-Spam-Status: No, score=-99.202 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id EAz_8UOUZopb for ; Wed, 20 Sep 2017 06:31:03 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id 34C625FB62 for ; Wed, 20 Sep 2017 06:31:03 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 72522E0041 for ; Wed, 20 Sep 2017 06:31:02 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 0D85E24504 for ; Wed, 20 Sep 2017 06:31:01 +0000 (UTC) Date: Wed, 20 Sep 2017 06:31:01 +0000 (UTC) From: "Eugene Kirpichov (JIRA)" To: commits@beam.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (BEAM-2826) Need to generate a single XML file when write is performed on small amount of data MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Wed, 20 Sep 2017 06:31:06 -0000 [ https://issues.apache.org/jira/browse/BEAM-2826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16172818#comment-16172818 ] Eugene Kirpichov commented on BEAM-2826: ---------------------------------------- This will be addressed as part of the FileIO.write() effort. However, what Luke suggests above will also work in practice as a workaround. > Need to generate a single XML file when write is performed on small amount of data > ---------------------------------------------------------------------------------- > > Key: BEAM-2826 > URL: https://issues.apache.org/jira/browse/BEAM-2826 > Project: Beam > Issue Type: New Feature > Components: beam-model > Affects Versions: 2.0.0 > Reporter: Balajee Venkatesh > Assignee: Eugene Kirpichov > > I'm trying to write an XML file where the source is a text file stored in GCS. The code is running fine but instead of a single XML file, it is generating multiple XML files. (No. of XML files seem to follow total no. of records present in source text file). I have observed this scenario while using 'DataflowRunner'. > When I run the same code in local then two files get generated. First one contains all the records with proper elements and the second one contains only opening and closing root element. > As I learnt,it is expected that it may produce multiple files: e.g. if the runner chooses to process your data parallelizing it into 3 tasks ("bundles"), you'll get 3 files. Some of the parts may turn out empty in some cases, but the total data written will always add up to the expected data. -- This message was sent by Atlassian JIRA (v6.4.14#64029)