beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kenneth Knowles (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (BEAM-434) When examples write output to file it creates many output files instead of one
Date Mon, 11 Jul 2016 21:43:11 GMT

    [ https://issues.apache.org/jira/browse/BEAM-434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15371700#comment-15371700
] 

Kenneth Knowles commented on BEAM-434:
--------------------------------------

I like all of these, but 2 and 3 actually a bit better than 1 for the reason you say - it
let's users know that output is sharded when they just look at the output files.

For the same reason, I prefer 2 over 3 as it let's users know from the "other end" that sharding
has to be controlled explicitly.

> When examples write output to file it creates many output files instead of one
> ------------------------------------------------------------------------------
>
>                 Key: BEAM-434
>                 URL: https://issues.apache.org/jira/browse/BEAM-434
>             Project: Beam
>          Issue Type: Bug
>          Components: examples-java
>            Reporter: Amit Sela
>            Assignee: Amit Sela
>            Priority: Minor
>
> When using `TextIO.Write.to("/path/to/output")` without any restrictions on the number
of shards, it might generate many output files (depending on your input), for WordCount for
example, you'll get as many output files as unique words in your input.
> Since I think examples are expected to execute in a friendly manner to "see" what it
does and not optimize for performance in some way, I suggest to use `withoutSharding()` when
writing the example output to an output file.
> Examples I could find that behave this way:
> org.apache.beam.examples.WordCount
> org.apache.beam.examples.complete.TfIdf
> org.apache.beam.examples.cookbook.DeDupExample



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message