flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-1901) Create sample operator for Dataset
Date Thu, 20 Aug 2015 08:30:45 GMT

    [ https://issues.apache.org/jira/browse/FLINK-1901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14704508#comment-14704508

ASF GitHub Bot commented on FLINK-1901:

Github user tillrohrmann commented on the pull request:

    @ChengXiangLi, you're right, I should have noticed earlier and raise a flag. But your
work is not in vain. I think it's some excellent piece of work and the `sample` method could
also become part of the core API right away.
    For the sake of completeness, let's do it once the `sampleWithSize` method works also
robustly. I think your proposition for the next steps is a good way to continue with it. Once
you've moved the `sample` and `sampleWithSize` methods to the `DataSetUtils` class, we close
and merge this PR. In the meantime, I'll create a JIRA for the topK operator, where we can
discuss the matter further.

> Create sample operator for Dataset
> ----------------------------------
>                 Key: FLINK-1901
>                 URL: https://issues.apache.org/jira/browse/FLINK-1901
>             Project: Flink
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Theodore Vasiloudis
>            Assignee: Chengxiang Li
> In order to be able to implement Stochastic Gradient Descent and a number of other machine
learning algorithms we need to have a way to take a random sample from a Dataset.
> We need to be able to sample with or without replacement from the Dataset, choose the
relative or exact size of the sample, set a seed for reproducibility, and support sampling
within iterations.

This message was sent by Atlassian JIRA

View raw message