flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Magnus Vojbacke <magnus.vojba...@gmail.com>
Subject Re: Split a dataset
Date Tue, 17 Oct 2017 09:05:16 GMT
Thank you, Fabian! If batch semantics are not important to my use case, is there any way to
"downgrade" or convert a DataSet to a DataStream?


> On 17 Oct 2017, at 10:54, Fabian Hueske <fhueske@gmail.com> wrote:
> Hi Magnus,
> there is no Split operator on the DataSet API.
> As you said, this can be done using a FilterFunction. This also allows for non-binary
> DataSet<X> setToSplit = ...
> DataSet<X> firstSplit = setToSplit.filter(new SplitCondition1());
> DataSet<X> secondSplit = setToSplit.filter(new SplitCondition2());
> DataSet<X> thirdSplit = setToSplit.filter(new SplitCondition3());
> where SplitCondition1, SplitCondition2, and SplitCondition3 are FilterFunction that filter
out all records that don't belong to the split.
> Best, Fabian
> 2017-10-17 10:42 GMT+02:00 Magnus Vojbacke <magnus.vojbacke@gmail.com <mailto:magnus.vojbacke@gmail.com>>:
> I'm looking for something like DataStream.split(), but for DataSets. I'd like to split
my streaming data so messages go to different parts of an execution graph, based on arbitrary
> DataStream.split() seems to be perfect, except that my source is a CSV file, and I have
only found built in functions for reading CSV files into a DataSet.
> I've evaluated using DataSet.filter(), but as far as I can tell, that only allows me
to emulate a yes/no split. This is not ideal because it's too coarse, and I would prefer a
more fine grained split than that.
> Do you have any suggestions on how I can achieve my arbitrary splitting logic for a)
DataSets in general, or b) CSV files?

View raw message