Return-Path: X-Original-To: apmail-datafu-dev-archive@minotaur.apache.org Delivered-To: apmail-datafu-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D0B6D10125 for ; Sun, 16 Feb 2014 13:05:42 +0000 (UTC) Received: (qmail 62543 invoked by uid 500); 16 Feb 2014 13:05:42 -0000 Delivered-To: apmail-datafu-dev-archive@datafu.apache.org Received: (qmail 62523 invoked by uid 500); 16 Feb 2014 13:05:42 -0000 Mailing-List: contact dev-help@datafu.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@datafu.incubator.apache.org Delivered-To: mailing list dev@datafu.incubator.apache.org Received: (qmail 62507 invoked by uid 99); 16 Feb 2014 13:05:41 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 16 Feb 2014 13:05:41 +0000 X-ASF-Spam-Status: No, hits=-2000.6 required=5.0 tests=ALL_TRUSTED,RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.3] (HELO mail.apache.org) (140.211.11.3) by apache.org (qpsmtpd/0.29) with SMTP; Sun, 16 Feb 2014 13:05:40 +0000 Received: (qmail 61877 invoked by uid 99); 16 Feb 2014 13:05:19 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 16 Feb 2014 13:05:19 +0000 Date: Sun, 16 Feb 2014 13:05:19 +0000 (UTC) From: "jian wang (JIRA)" To: dev@datafu.incubator.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Assigned] (DATAFU-26) Resolve entropy UDF naming conventions MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/DATAFU-26?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jian wang reassigned DATAFU-26: ------------------------------- Assignee: jian wang > Resolve entropy UDF naming conventions > -------------------------------------- > > Key: DATAFU-26 > URL: https://issues.apache.org/jira/browse/DATAFU-26 > Project: DataFu > Issue Type: Task > Reporter: Matthew Hayes > Assignee: jian wang > Fix For: 1.3.0 > > > There are a couple issues with the naming of entropy UDFs that we should work out before the next release. > StreamingEntropy supports multiple estimation methods. Entropy however only support empirical. The supported constructors are also different as a result. Although Entropy's documentation states it computes the empirical entropy, I think the name itself may lead to confusion. > StreamingEntropy takes data the data in sorted order. Using this sorted data it computes count, which are then used to compute entropy. Entropy on the other hand takes counts directly and computes entropy. These counts need to be computed before calling it. Our convention in DataFu has been that "Streaming" implies that the data does not need to be sorted. So StreamingEntropy is in conflict with this. > My proposal is: > 1) Rename Entropy to EmpiricalEntropy > 2) Rename StreamingEntropy to Entropy > 3) Clearly document why you would use EmpiricalEntropy over Entropy. It will be more efficient in some scenarios and we should explain this. > One open question I have is whether we should distinguish in the name somehow that EmpiricalEntropy accepts counts, not the actual items themselves. EmpiricalCountBasedEntropy seems verbose. -- This message was sent by Atlassian JIRA (v6.1.5#6160)