flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Till Rohrmann (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-1901) Create sample operator for Dataset
Date Tue, 21 Jul 2015 07:39:04 GMT

    [ https://issues.apache.org/jira/browse/FLINK-1901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14634664#comment-14634664
] 

Till Rohrmann commented on FLINK-1901:
--------------------------------------

Hi Chengxiang,

good to hear that you want to work in this. I can assign you the ticket. However, it is not
only about the sampling strategy but also about the integration within Flink. The reason is
that we have to make sure that the sampling operator also works within iterations. This means
that it has to be part of the dynamic path so that it is triggered for every iteration again
and again. This will need a special operator type.

But you can start with the sampling strategies and then continue with the iteration integration.

> Create sample operator for Dataset
> ----------------------------------
>
>                 Key: FLINK-1901
>                 URL: https://issues.apache.org/jira/browse/FLINK-1901
>             Project: Flink
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Theodore Vasiloudis
>
> In order to be able to implement Stochastic Gradient Descent and a number of other machine
learning algorithms we need to have a way to take a random sample from a Dataset.
> We need to be able to sample with or without replacement from the Dataset, choose the
relative size of the sample, and set a seed for reproducibility.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message