flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Fabian Hueske <fhue...@gmail.com>
Subject Re: Split a dataset
Date Tue, 17 Oct 2017 19:28:54 GMT
Unfortunately, it's not possible to bridge the gap between the DataSet and
DataStream APIs.

However, you can also use a CsvInputFormat in the DataStream API. Since
there's no built-in API to configure the CSV input, you would have to
create (and configure) the CsvInputFormat yourself.
Once you have the CsvInputFormat, you can create a DataStream using
StreamExecutionEnvironment.readFile(csvIF).

Hope this helps,
Fabian

2017-10-17 11:05 GMT+02:00 Magnus Vojbacke <magnus.vojbacke@gmail.com>:

> Thank you, Fabian! If batch semantics are not important to my use case, is
> there any way to "downgrade" or convert a DataSet to a DataStream?
>
> BR
> /Magnus
>
> On 17 Oct 2017, at 10:54, Fabian Hueske <fhueske@gmail.com> wrote:
>
> Hi Magnus,
>
> there is no Split operator on the DataSet API.
>
> As you said, this can be done using a FilterFunction. This also allows for
> non-binary splits:
>
> DataSet<X> setToSplit = ...
> DataSet<X> firstSplit = setToSplit.filter(new SplitCondition1());
> DataSet<X> secondSplit = setToSplit.filter(new SplitCondition2());
> DataSet<X> thirdSplit = setToSplit.filter(new SplitCondition3());
>
> where SplitCondition1, SplitCondition2, and SplitCondition3 are
> FilterFunction that filter out all records that don't belong to the split.
>
> Best, Fabian
>
> 2017-10-17 10:42 GMT+02:00 Magnus Vojbacke <magnus.vojbacke@gmail.com>:
>
>> I'm looking for something like DataStream.split(), but for DataSets. I'd
>> like to split my streaming data so messages go to different parts of an
>> execution graph, based on arbitrary logic.
>>
>> DataStream.split() seems to be perfect, except that my source is a CSV
>> file, and I have only found built in functions for reading CSV files into a
>> DataSet.
>>
>> I've evaluated using DataSet.filter(), but as far as I can tell, that
>> only allows me to emulate a yes/no split. This is not ideal because it's
>> too coarse, and I would prefer a more fine grained split than that.
>>
>>
>> Do you have any suggestions on how I can achieve my arbitrary splitting
>> logic for a) DataSets in general, or b) CSV files?
>>
>>
>
>

Mime
View raw message