beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jeffrey Payne (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (BEAM-55) Allow users to compress FileBasedSink output files
Date Thu, 29 Sep 2016 01:51:20 GMT

    [ https://issues.apache.org/jira/browse/BEAM-55?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15531516#comment-15531516
] 

Jeffrey Payne commented on BEAM-55:
-----------------------------------

We too prefer to use binary file formats like Avro or Parquet, for many reasons, including
automatic compression handling.  Unfortunately, we have several existing SLAs with clients
that necessitate compressed CSV output, some even require a *single compressed CSV file*,
ugh.  What they do with the file once it's out of our hands is their problem :)

I'll read through the contribution guide, fork beam, and submit a PR.  Thanks again for the
direction!

> Allow users to compress FileBasedSink output files
> --------------------------------------------------
>
>                 Key: BEAM-55
>                 URL: https://issues.apache.org/jira/browse/BEAM-55
>             Project: Beam
>          Issue Type: New Feature
>          Components: sdk-java-core
>            Reporter: Daniel Halperin
>            Priority: Minor
>
> FileBasedSink (also TextIO.Write, AvroIO.Write, etc). does not have an option for compressing
its output.
> In general, we discourage compression because it limits or blocks scalably reading from
a file in parallel. However, users may want it -- so we should support the option (with appropriate
warnings).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message