Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id E4AB9200B3B for ; Mon, 11 Jul 2016 18:00:17 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id E32D5160A78; Mon, 11 Jul 2016 16:00:17 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 3885C160A5E for ; Mon, 11 Jul 2016 18:00:17 +0200 (CEST) Received: (qmail 42501 invoked by uid 500); 11 Jul 2016 16:00:16 -0000 Mailing-List: contact commits-help@beam.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@beam.incubator.apache.org Delivered-To: mailing list commits@beam.incubator.apache.org Received: (qmail 42492 invoked by uid 99); 11 Jul 2016 16:00:16 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 11 Jul 2016 16:00:16 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id B8DC3C718B for ; Mon, 11 Jul 2016 16:00:15 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -4.507 X-Spam-Level: X-Spam-Status: No, score=-4.507 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, KAM_LAZY_DOMAIN_SECURITY=1, RCVD_IN_DNSWL_HI=-5, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RP_MATCHES_RCVD=-1.287] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id JJGE1BjQ33mU for ; Mon, 11 Jul 2016 16:00:13 +0000 (UTC) Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with SMTP id 865045F256 for ; Mon, 11 Jul 2016 16:00:13 +0000 (UTC) Received: (qmail 41455 invoked by uid 99); 11 Jul 2016 16:00:12 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 11 Jul 2016 16:00:12 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 80C932C02A3 for ; Mon, 11 Jul 2016 16:00:12 +0000 (UTC) Date: Mon, 11 Jul 2016 16:00:12 +0000 (UTC) From: "Daniel Halperin (JIRA)" To: commits@beam.incubator.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (BEAM-434) When examples write output to file it creates many output files instead of one MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Mon, 11 Jul 2016 16:00:18 -0000 [ https://issues.apache.org/jira/browse/BEAM-434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15371052#comment-15371052 ] Daniel Halperin commented on BEAM-434: -------------------------------------- It does seem very bad if the DirectRunner produces a bundle per key. I filed [BEAM-435]. > When examples write output to file it creates many output files instead of one > ------------------------------------------------------------------------------ > > Key: BEAM-434 > URL: https://issues.apache.org/jira/browse/BEAM-434 > Project: Beam > Issue Type: Bug > Components: examples-java > Reporter: Amit Sela > Assignee: Amit Sela > Priority: Minor > > When using `TextIO.Write.to("/path/to/output")` without any restrictions on the number of shards, it might generate many output files (depending on your input), for WordCount for example, you'll get as many output files as unique words in your input. > Since I think examples are expected to execute in a friendly manner to "see" what it does and not optimize for performance in some way, I suggest to use `withoutSharding()` when writing the example output to an output file. > Examples I could find that behave this way: > org.apache.beam.examples.WordCount > org.apache.beam.examples.complete.TfIdf > org.apache.beam.examples.cookbook.DeDupExample -- This message was sent by Atlassian JIRA (v6.3.4#6332)