beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Borisa Zivkovic (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (BEAM-638) Add sink transform to write bounded data per window, pane, [and key] even when PCollection is unbounded
Date Wed, 03 May 2017 10:00:10 GMT

    [ https://issues.apache.org/jira/browse/BEAM-638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15994601#comment-15994601
] 

Borisa Zivkovic commented on BEAM-638:
--------------------------------------

Just to contribute to this, might be useful for further discussion - as beginner user of Beam
I also found it confusing that TextIO can not write data coming from Kafka directly. Looks
a bit inconsistent even if there is good reason why is it so. 

Then I tried to do 

myStringsComingFromKafka.apply(Window.<String>into(FixedWindows.of(Duration.standardSeconds(1))))
        .apply("WritingToOutput", TextIO.Write.to("/myOutputLocation/").withNumShards(5));

but this also does not work since I need groupByKey or combine after window...

In the official documentation there are few examples how to create bounded collection from
unbounded but they mostly focus on Combine and GroupByKey - it is not very clear what to do
if you are working with unbounded collections that are not KV and you do not want to combine
collections.

Basically all I want to do is 

1) read values (not KV) from kafka
2) write those values, as they are arriving, using TextIO

There might be an example how to do this but it was not easy for me to find it



> Add sink transform to write bounded data per window, pane, [and key] even when PCollection
is unbounded
> -------------------------------------------------------------------------------------------------------
>
>                 Key: BEAM-638
>                 URL: https://issues.apache.org/jira/browse/BEAM-638
>             Project: Beam
>          Issue Type: New Feature
>          Components: sdk-java-core
>            Reporter: Jean-Baptiste Onofré
>            Assignee: Davor Bonaci
>
> Today, if the pipeline source is unbounded, and the sink expects a bounded collection,
there's no way to use a single pipeline. Even a window creates a chunk on the unbounded PCollection,
but the "sub" PCollection is still unbounded.
> It would be helpful for users to have a Window function that create a bounded PCollection
(on the window) from an unbounded PCollection coming from the source.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message