Return-Path: X-Original-To: apmail-mahout-dev-archive@www.apache.org Delivered-To: apmail-mahout-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id F3271608F for ; Wed, 6 Jul 2011 15:05:40 +0000 (UTC) Received: (qmail 94454 invoked by uid 500); 6 Jul 2011 15:05:40 -0000 Delivered-To: apmail-mahout-dev-archive@mahout.apache.org Received: (qmail 94337 invoked by uid 500); 6 Jul 2011 15:05:39 -0000 Mailing-List: contact dev-help@mahout.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@mahout.apache.org Delivered-To: mailing list dev@mahout.apache.org Received: (qmail 94084 invoked by uid 99); 6 Jul 2011 15:05:39 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 06 Jul 2011 15:05:39 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 06 Jul 2011 15:05:37 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id CD11445243 for ; Wed, 6 Jul 2011 15:05:16 +0000 (UTC) Date: Wed, 6 Jul 2011 15:05:16 +0000 (UTC) From: "Christoph Nagel (JIRA)" To: dev@mahout.apache.org Message-ID: <344537650.4230.1309964716836.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <659571771.1868.1309340012100.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (MAHOUT-747) Entropy implementation in Map/Reduce MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/MAHOUT-747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13060624#comment-13060624 ] Christoph Nagel commented on MAHOUT-747: ---------------------------------------- Cool & Thanks. Looked at it and only one question. Isn't *.common.mapreduce the best package for DoubleSumReducer, KeyCounterMapper, ValueCounterMapper, VarInSumReducer? These are generic mapper and reducer and while coding, I was surprised, that nobody had implemented them yet. Regards, Christoph. > Entropy implementation in Map/Reduce > ------------------------------------ > > Key: MAHOUT-747 > URL: https://issues.apache.org/jira/browse/MAHOUT-747 > Project: Mahout > Issue Type: New Feature > Components: Math > Affects Versions: 0.6 > Reporter: Christoph Nagel > Assignee: Sean Owen > Fix For: 0.6 > > Attachments: MAHOUT-747.patch > > > Hi again, > because I got much to work with entropy and information gain ratio, I want to implement the following distributed algorithms: > * Entropy (https://secure.wikimedia.org/wikipedia/en/wiki/Entropy_%28information_theory%29) > * Conditional Entropy (https://secure.wikimedia.org/wikipedia/en/wiki/Conditional_entropy) > * Information Gain > * Information Gain Ratio (https://secure.wikimedia.org/wikipedia/en/wiki/Information_gain_ratio) > This issue is at first only for entropy. > Some questions: > * In which package do the classes belong. I put them first at 'org.apache.mahout.math.stats', don't know if this is right, because they are components of information retrieval. > * Entropy only reads a set of elements. As input i took a sequence file with keys of type Text and values anyone, because I only work with the keys. Is this the best practise? > * Is there a generic solution, so that the type of keys can be anything inherited from Writable? > In Hadoop is a TokenCounterMapper, which emits each value with an IntWritable(1). I added a KeyCounterMapper into 'org.apache.mahout.common.mapreduce' which does the same with the keys. > Will append my patch soon. > Regards, Christoph. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira