flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Fabian Hueske <fhue...@gmail.com>
Subject Re: Split a dataset
Date Tue, 17 Oct 2017 08:54:49 GMT
Hi Magnus,

there is no Split operator on the DataSet API.

As you said, this can be done using a FilterFunction. This also allows for
non-binary splits:

DataSet<X> setToSplit = ...
DataSet<X> firstSplit = setToSplit.filter(new SplitCondition1());
DataSet<X> secondSplit = setToSplit.filter(new SplitCondition2());
DataSet<X> thirdSplit = setToSplit.filter(new SplitCondition3());

where SplitCondition1, SplitCondition2, and SplitCondition3 are
FilterFunction that filter out all records that don't belong to the split.

Best, Fabian

2017-10-17 10:42 GMT+02:00 Magnus Vojbacke <magnus.vojbacke@gmail.com>:

> I'm looking for something like DataStream.split(), but for DataSets. I'd
> like to split my streaming data so messages go to different parts of an
> execution graph, based on arbitrary logic.
> DataStream.split() seems to be perfect, except that my source is a CSV
> file, and I have only found built in functions for reading CSV files into a
> DataSet.
> I've evaluated using DataSet.filter(), but as far as I can tell, that only
> allows me to emulate a yes/no split. This is not ideal because it's too
> coarse, and I would prefer a more fine grained split than that.
> Do you have any suggestions on how I can achieve my arbitrary splitting
> logic for a) DataSets in general, or b) CSV files?

View raw message