flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From sachingoel0101 <...@git.apache.org>
Subject [GitHub] flink pull request: [Flink-2030][ml]Online Histogram: Discrete and...
Date Tue, 23 Jun 2015 08:47:51 GMT
GitHub user sachingoel0101 opened a pull request:


    [Flink-2030][ml]Online Histogram: Discrete and Categorical

    This implements the Online Histograms for both categorical and continuous data. For continuous
data, we emulate a continuous probability distribution which supports finding cumulative sum
upto a particular value, and finding value upto a specific cumulative probability [Quantiles].

    For categorical fields, we emulate a probability mass function which supports finding
the probability associated with every class.
    The continuous histogram follows this paper: http://www.jmlr.org/papers/volume11/ben-haim10a/ben-haim10a.pdf
    Note: This is a sub-task of https://issues.apache.org/jira/browse/FLINK-1727 which already
has a PR pending review at https://github.com/apache/flink/pull/710.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/sachingoel0101/flink online_histogram

Alternatively you can review and apply these changes as the patch at:


To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #861
commit ec50b4bb4faf91570724b4aa79783936d0a9487f
Author: Sachin Goel <sachingoel0101@gmail.com>
Date:   2015-06-23T08:40:57Z

    Online Histogram: Discrete and Categorical, Test Suites included


If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.

View raw message