flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-2030) Implement an online histogram with Merging and equalization features
Date Thu, 03 Sep 2015 07:37:46 GMT

    [ https://issues.apache.org/jira/browse/FLINK-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14728625#comment-14728625
] 

ASF GitHub Bot commented on FLINK-2030:
---------------------------------------

Github user chiwanpark commented on a diff in the pull request:

    https://github.com/apache/flink/pull/861#discussion_r38619843
  
    --- Diff: flink-java/src/main/java/org/apache/flink/api/java/utils/DataSetUtils.java ---
    @@ -248,6 +251,58 @@ public void mapPartition(Iterable<T> values, Collector<Tuple2<Long,
T>> out) thr
     			input.getType(), sampleInCoordinator, callLocation);
     	}
     
    +	/**
    +	 * Creates a {@link org.apache.flink.api.common.accumulators.DiscreteHistogram} from
the data set
    +	 *
    +	 * @param data Discrete valued data set
    +	 * @return A histogram over data
    +	 */
    +	public static DataSet<DiscreteHistogram> createDiscreteHistogram(DataSet<Double>
data) {
    +		return data.mapPartition(new RichMapPartitionFunction<Double, DiscreteHistogram>()
{
    +			@Override
    +			public void mapPartition(Iterable<Double> values, Collector<DiscreteHistogram>
out)
    +					throws Exception {
    +				DiscreteHistogram histogram = new DiscreteHistogram();
    +				for (double value : values) {
    +					histogram.add(value);
    +				}
    +				out.collect(histogram);
    +			}
    +		}).reduce(new ReduceFunction<DiscreteHistogram>() {
    +			@Override
    +			public DiscreteHistogram reduce(DiscreteHistogram value1, DiscreteHistogram value2)
throws Exception {
    +				value1.merge(value2);
    +				return value1;
    +			}
    +		});
    +	}
    +
    +	/**
    +	 * Creates a {@link org.apache.flink.api.common.accumulators.DiscreteHistogram} from
the data set
    +	 *
    +	 * @param data Discrete valued data set
    +	 * @param bins Number of bins in the histogram
    +	 * @return A histogram over data
    +	 */
    +	public static DataSet<ContinuousHistogram> createContinuousHistogram(DataSet<Double>
data, final int bins) {
    +		return data.mapPartition(new RichMapPartitionFunction<Double, ContinuousHistogram>()
{
    +			@Override
    +			public void mapPartition(Iterable<Double> values, Collector<ContinuousHistogram>
out)
    +					throws Exception {
    --- End diff --
    
    Same here (unnecessary new line)


> Implement an online histogram with Merging and equalization features
> --------------------------------------------------------------------
>
>                 Key: FLINK-2030
>                 URL: https://issues.apache.org/jira/browse/FLINK-2030
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Machine Learning Library
>            Reporter: Sachin Goel
>            Assignee: Sachin Goel
>            Priority: Minor
>              Labels: ML
>
> For the implementation of the decision tree in https://issues.apache.org/jira/browse/FLINK-1727,
we need to implement an histogram with online updates, merging and equalization features.
A reference implementation is provided in [1]
> [1].http://www.jmlr.org/papers/volume11/ben-haim10a/ben-haim10a.pdf



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message