flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Radu Tudoran <radu.tudo...@huawei.com>
Subject RE: Dummy DataStream
Date Fri, 27 Jan 2017 09:30:29 GMT
Hi Duck,

I am not 100% sure I understand your exact scenario but I will try to give you some pointers,
maybe it will help.

Typically when you do the split you have some knowledge about the criterion to do the split.
For example if you follow the example from the website
https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/datastream_api.html

SplitStream<Integer> split = someDataStream.split(new OutputSelector<Integer>()
{
    @Override
    public Iterable<String> select(Integer value) {
        List<String> output = new ArrayList<String>();
        if (value % 2 == 0) {
            output.add("even");
        }
        else {
            output.add("odd");
        }
        return output;
    }
});

You would know you have a stream for even and odd and then you can collect them in your list
by doing

myList.add(split.select("even"));
myList.add(split.select("odd"));

for that matter, the SplitStream object kind of does the same.

I would say that you have 2 options from this to get your full stream back:
You can use the option from the website:
DataStream<Integer> all = split.select("even","odd");
Which I believe does not work as you might have some operations performed on the splits.
The other option is to use union, which aggregates the independent streams without a specific
condition like a join.

You could do something like
For(DataStream stream:myList)
                allStream = allStream.union(stream)



From: Duck [mailto:kcud@protonmail.com]
Sent: Thursday, January 26, 2017 9:08 PM
To: user@flink.apache.org
Subject: Dummy DataStream

I have a project where i am reading in on a single DataStream from Kafka, then sending to
a variable number of handlers based on content of the recieved data, after that i want to
join them all. Since i do not know how many different streams this will create, i cannot have
a single "base" to performa a Join operation on. So my question is, can i create a "dummy
/ empty" DataStream<MyObject> to use as a join basis?

Example:
1) DataStream<MyObject> all = ..
2) Create a List<DataStream<MyObject>> myList;
3) Then i split the "all" datastream based on content, and add each stream to "myList"
4) I now parse each of the different streams....
5) I now want to join my list of streams, "myList" to a DataStream<MyObject> all_joined_again;

/Duck


Mime
View raw message