flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From smandrell <sdmandr...@berkeley.edu>
Subject Split Streams not working
Date Mon, 24 Jul 2017 20:14:47 GMT
Basically, we are not splitting the streams correctly because when we try to
select the stream we want from our splitStream (using the select()
operation), it never returns a DataStream with just ERROR_EVENT's or a
DataStream with just SUCCESS_EVENT's. Instead it returns a DataStream with
both ERROR_EVENT's and SUCCESS_EVENT's.



I am receiving data by doing the following:

return env.fromElements(SUCCESS_EVENT_JSON, SUCCESS_AND_ERROR_EVENT_JSON);

SUCCESS_EVENT_JSON will generate one success event once it is sent through
our parser. This is not the concern.

The concern is the SUCCESS_AND_ERROR_EVENT_JSON. SUCCESS_AND_ERROR_EVENT
will generate 3 events once it is sent through our parser: 1 success event
and 2 error events.


To discern between success events and error events in a given stream, we use
the following splitting logic:

<http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/file/n14418/parser.png>


This splitting logic works fine when dealing with the stream generated from
our parser on the SUCCESS_EVENT_JSON because there is only one event at play
here: the success event.

However, the splitting logic does not correctly split the stream generated
from sending SUCCESS_AND_ERROR_EVENT_JSON through our parser. 

For some context: when sending SUCCESS_AND_ERROR_EVENT_JSON through our
parser, the parser returns a DataStream<List&lt;SuperClassEvent>> in the
following form [ERROR_EVENT, ERROR_EVENT, SUCCESS_EVENT]. 

As you can see from the above code, we try to separate the ERROR_EVENT's
from the SUCCESS_EVENT by doing output.add("success") or output.add("error")
but when when we attempt to select the events in our
SplitStream[ERROR_EVENT, ERROR_EVENT, SUCCESS_EVENT] with
splitStream.select("success") and splitStream.select("error"), the different
events are not separated and both select() operations
(splitStream.select("success") & splitStream.select("error")) return two
DataStream[ERROR_EVENT, ERROR_EVENT, SUCCESS_EVENT]'s and not one
DataStream[ERROR_EVENT, ERROR_EVENT] and one DataStream[SUCCESS_EVENT].

My suspicion for this bug is that we are attempting to split a
DataStream<List&lt;TimeseriesEvt>> instead of a DataStream<TimeseriesEvt>,
but I cannot find a workaround for DataStream<List&lt;TimeseriesEvt>>.

Thanks!!






--
View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Split-Streams-not-working-tp14418.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.

Mime
View raw message