flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-1901) Create sample operator for Dataset
Date Mon, 24 Aug 2015 07:25:45 GMT

    [ https://issues.apache.org/jira/browse/FLINK-1901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14708889#comment-14708889

ASF GitHub Bot commented on FLINK-1901:

Github user tillrohrmann commented on the pull request:

    @sachingoel0101, you're right. The problem is that Flink does not give you a guarantee
in which order the elements will arrive. But this problem won't be fixed by setting the seed
for all sampling operators to the same value. There always might be an operator, e.g. rebalance,
which will completely randomize your element order.

> Create sample operator for Dataset
> ----------------------------------
>                 Key: FLINK-1901
>                 URL: https://issues.apache.org/jira/browse/FLINK-1901
>             Project: Flink
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Theodore Vasiloudis
>            Assignee: Chengxiang Li
> In order to be able to implement Stochastic Gradient Descent and a number of other machine
learning algorithms we need to have a way to take a random sample from a Dataset.
> We need to be able to sample with or without replacement from the Dataset, choose the
relative or exact size of the sample, set a seed for reproducibility, and support sampling
within iterations.

This message was sent by Atlassian JIRA

View raw message