beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Florian Scharinger (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (BEAM-55) Allow users to compress FileBasedSink output files
Date Sun, 03 Jul 2016 23:44:10 GMT

    [ https://issues.apache.org/jira/browse/BEAM-55?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15360708#comment-15360708
] 

Florian Scharinger commented on BEAM-55:
----------------------------------------

We have the use case where we are writing an avro file per customer per day. Writing each
file compressed should speed up writing and reading the files significantly, but still allows
scalable reading as we would have thousands of files per day. At the moment our Dataflow job
spends a significant time just reading the (uncompressed) files.

> Allow users to compress FileBasedSink output files
> --------------------------------------------------
>
>                 Key: BEAM-55
>                 URL: https://issues.apache.org/jira/browse/BEAM-55
>             Project: Beam
>          Issue Type: New Feature
>          Components: sdk-java-core
>            Reporter: Daniel Halperin
>            Priority: Minor
>
> FileBasedSink (also TextIO.Write, AvroIO.Write, etc). does not have an option for compressing
its output.
> In general, we discourage compression because it limits or blocks scalably reading from
a file in parallel. However, users may want it -- so we should support the option (with appropriate
warnings).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message