flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vasiliki Kalavri <vasilikikala...@gmail.com>
Subject Re: sampling function
Date Mon, 11 Jul 2016 09:44:10 GMT
Hi Do,

Paris and Martha worked on sampling techniques for data streams on Flink
last year. If you want to implement your own samplers, you might find
Martha's master thesis helpful [1].

-Vasia.

[1]: http://kth.diva-portal.org/smash/get/diva2:910695/FULLTEXT01.pdf

On 11 July 2016 at 11:31, Kostas Kloudas <k.kloudas@data-artisans.com>
wrote:

> Hi Do,
>
> In DataStream you can always implement your own
> sampling function, hopefully without too much effort.
>
> Adding such functionality it to the API could be a good idea.
> But given that in sampling there is no “one-size-fits-all”
> solution (as not every use case needs random sampling and not
> all random samplers fit to all workloads), I am not sure if we
> should start adding different sampling operators.
>
> Thanks,
> Kostas
>
> > On Jul 9, 2016, at 5:43 PM, Greg Hogan <code@greghogan.com> wrote:
> >
> > Hi Do,
> >
> > DataSet provides a stable @Public interface. DataSetUtils is marked
> > @PublicEvolving which is intended for public use, has stable behavior,
> but
> > method signatures may change. It's also good to limit DataSet to common
> > methods whereas the utility methods tend to be used for specific
> > applications.
> >
> > I don't have the pulse of streaming but this sounds like a useful feature
> > that could be added.
> >
> > Greg
> >
> > On Sat, Jul 9, 2016 at 10:47 AM, Le Quoc Do <lequocdo@gmail.com> wrote:
> >
> >> Hi all,
> >>
> >> I'm working on approximate computing using sampling techniques. I
> >> recognized that Flink supports the sample function for Dataset
> >> (org/apache/flink/api/java/utils/DataSetUtils.java). I'm just wondering
> why
> >> you didn't merge the function to org/apache/flink/api/java/DataSet.java
> >> since the sample function works as a transformation operator?
> >>
> >> The second question is that are you planning to support the sample
> >> function for DataStream (within windows) since I did not see it in
> >> DataStream code ?
> >>
> >> Thank you,
> >> Do
> >>
>
>

Mime
View raw message