datafu-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matthew Hayes (JIRA)" <>
Subject [jira] [Closed] (DATAFU-2) UDFs for entropy and weighted sampling algorithms
Date Thu, 06 Mar 2014 04:58:48 GMT


Matthew Hayes closed DATAFU-2.

> UDFs for entropy and weighted sampling algorithms
> -------------------------------------------------
>                 Key: DATAFU-2
>                 URL:
>             Project: DataFu
>          Issue Type: Task
>            Reporter: Matthew Hayes
>            Assignee: Matthew Hayes
>             Fix For: 1.3.0
>         Attachments: 0001-create-initial-version-of-entroy-UDFs.patch, 0002-update-a-few-comments-and-error-messages.patch,
0003-fix-a-bug-in-Entropy.accumulate-to-use-getFreq-metho.patch, 0004-update-entropy-implementation-following-code-review-.patch,
0005-update-javadocs.patch, 0006-update-javadocs.patch, 0007-update-the-javadocs-of-streaming-empirical-entropy-a.patch,
0008-update-entropy-udfs-based-on-code-review.patch, 0009-Implement-and-experiment-with-different-weighted-sam.patch,
0010-update-weighted-reservoir-sampler-constructor-unit-t.patch, 0011-update-licence-headers-and-move-streaming-entropy-to.patch,
> Jian Wang has suggested that we add UDFs for entropy and weighted random sampling and
has implementations for each of these ready.
> In Jian's words:
> "In the real world, there are occasions we need to calculate the entropy of discrete
random variables, for instance, to calculate the mutual information between variable X and
Y using its entropy-based formula(mutual information calculation could be found at
Would suggest to implement a UDF to calculate the entropy of given input samples, following
the definition at
> This is the reference paper I use to learn about the weighted sampleing algorithm:
> The present implements the Algorithm D.
> We may try Algorithm A, A-res and A-expJ since they could be used in a data stream and
distributed environment. These algorithms could be implemented based on
from this class?) since they also need a reservior to store the selected items."

This message was sent by Atlassian JIRA

View raw message