beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Christopher Hebert (JIRA)" <j...@apache.org>
Subject [jira] [Created] (BEAM-2750) Read whole files as one element each
Date Mon, 07 Aug 2017 20:15:00 GMT
Christopher Hebert created BEAM-2750:
----------------------------------------

             Summary: Read whole files as one element each
                 Key: BEAM-2750
                 URL: https://issues.apache.org/jira/browse/BEAM-2750
             Project: Beam
          Issue Type: New Feature
          Components: sdk-java-core
            Reporter: Christopher Hebert
            Assignee: Davor Bonaci


I'd like to read whole files as one input each.

If my input files are hi.txt, what.txt, and yes.txt, then the whole contents of hi.txt are
an element of the returned PCollection, the whole contents of what.txt are the next element,
etc., giving me a PCollection with three elements.

This contrasts with TextIO which reads a new element for every line of text in the input files.

This read (I'll call it WholeFileIO for now) would work like so:

{code:java}
PCollection<KV<String, Byte[]>> fileNamesAndBytes = p.apply("Read", WholeFileIO.read().from("/path/to/input/dir/*"));
{code}





--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message