##### Site index · List index
Message view
Top
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-2030) Implement an online histogram with Merging and equalization features
Date Thu, 20 Aug 2015 13:27:46 GMT

]

ASF GitHub Bot commented on FLINK-2030:
---------------------------------------

Github user sachingoel0101 commented on a diff in the pull request:

--- Diff: docs/libs/ml/statistics.md ---
@@ -0,0 +1,69 @@
+---
+mathjax: include
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+
+Unless required by applicable law or agreed to in writing,
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+-->
+
+* This will be replaced by the TOC
+{:toc}
+
+## Description
+
+ The statistics utility provides features such as building histograms over data.
+
+## Methods
+
+ The Statistics utility provides two major functions: createHistogram and
+ createDiscreteHistogram.
+
+### Creating a histogram
+
+ There are two types of histograms:
+   1. **Continuous Histograms**: These histograms are formed on a data set X: DataSet[Double]
+   when the values in X are from a continuous range. These histograms support
+   quantile and sum  operations. Here quantile(q) refers to a value $x_q$ such
that $|x: x + \leq x_q| = q * |X|$. Further, sum(s) refers to the number of elements $x \leq s$,
which can
+    be construed as a cumulative probability value at $s$[Of course, *scaled* probability].
+   2. A continuous histogram can be formed by calling X.createHistogram(b) where b
is the
+    number of bins.
+    **Discrete Histograms**: These histograms are formed on a data set X:DataSet[Double]
+    when the values in X are from a discrete distribution. These histograms
+    support count(c) operation which returns the number of elements associated with
cateogry c.
+    <br>
+        A discrete histogram can be formed by calling MLUtils.createDiscreteHistogram(X).
--- End diff --

This was on a suggestion by Theo. It was decided to provide a pimp-my-class style function
only for ContinuousHistograms.

> Implement an online histogram with Merging and equalization features
> --------------------------------------------------------------------
>
>          Components: Machine Learning Library
>            Reporter: Sachin Goel
>            Assignee: Sachin Goel
>            Priority: Minor
>              Labels: ML
>
> For the implementation of the decision tree in https://issues.apache.org/jira/browse/FLINK-1727,
we need to implement an histogram with online updates, merging and equalization features.
A reference implementation is provided in [1]
> [1].http://www.jmlr.org/papers/volume11/ben-haim10a/ben-haim10a.pdf

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Mime
View raw message