flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kostas Kloudas <k.klou...@data-artisans.com>
Subject Re: sampling function
Date Mon, 11 Jul 2016 09:31:31 GMT
Hi Do,

In DataStream you can always implement your own 
sampling function, hopefully without too much effort. 

Adding such functionality it to the API could be a good idea. 
But given that in sampling there is no “one-size-fits-all”
solution (as not every use case needs random sampling and not
all random samplers fit to all workloads), I am not sure if we 
should start adding different sampling operators.


> On Jul 9, 2016, at 5:43 PM, Greg Hogan <code@greghogan.com> wrote:
> Hi Do,
> DataSet provides a stable @Public interface. DataSetUtils is marked
> @PublicEvolving which is intended for public use, has stable behavior, but
> method signatures may change. It's also good to limit DataSet to common
> methods whereas the utility methods tend to be used for specific
> applications.
> I don't have the pulse of streaming but this sounds like a useful feature
> that could be added.
> Greg
> On Sat, Jul 9, 2016 at 10:47 AM, Le Quoc Do <lequocdo@gmail.com> wrote:
>> Hi all,
>> I'm working on approximate computing using sampling techniques. I
>> recognized that Flink supports the sample function for Dataset
>> (org/apache/flink/api/java/utils/DataSetUtils.java). I'm just wondering why
>> you didn't merge the function to org/apache/flink/api/java/DataSet.java
>> since the sample function works as a transformation operator?
>> The second question is that are you planning to support the sample
>> function for DataStream (within windows) since I did not see it in
>> DataStream code ?
>> Thank you,
>> Do

View raw message