flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chengxiang Li (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-1901) Create sample operator for Dataset
Date Tue, 21 Jul 2015 09:17:05 GMT

    [ https://issues.apache.org/jira/browse/FLINK-1901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14634822#comment-14634822

Chengxiang Li commented on FLINK-1901:

Hi, [~sachingoel0101], I didn't find any related class about sampling while search the project
with the keyword, is the PR you mentioned ongoing now? Besides ML algorithms, there should
be other use case depends on sampling operation, such as range partition, and i believe sample
operation itself is a common operation which may be used directly by user.

> Create sample operator for Dataset
> ----------------------------------
>                 Key: FLINK-1901
>                 URL: https://issues.apache.org/jira/browse/FLINK-1901
>             Project: Flink
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Theodore Vasiloudis
>            Assignee: Chengxiang Li
> In order to be able to implement Stochastic Gradient Descent and a number of other machine
learning algorithms we need to have a way to take a random sample from a Dataset.
> We need to be able to sample with or without replacement from the Dataset, choose the
relative size of the sample, and set a seed for reproducibility.

This message was sent by Atlassian JIRA

View raw message