mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Christoph Nagel (JIRA)" <>
Subject [jira] [Commented] (MAHOUT-747) Entropy implementation in Map/Reduce
Date Wed, 06 Jul 2011 15:05:16 GMT


Christoph Nagel commented on MAHOUT-747:

Cool & Thanks. Looked at it and only one question.
Isn't *.common.mapreduce the best package for DoubleSumReducer, KeyCounterMapper, ValueCounterMapper,
VarInSumReducer? These are generic mapper and reducer and while coding, I was surprised, that
nobody had implemented them yet.

Regards, Christoph.

> Entropy implementation in Map/Reduce
> ------------------------------------
>                 Key: MAHOUT-747
>                 URL:
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Math
>    Affects Versions: 0.6
>            Reporter: Christoph Nagel
>            Assignee: Sean Owen
>             Fix For: 0.6
>         Attachments: MAHOUT-747.patch
> Hi again,
> because I got much to work with entropy and information gain ratio, I want to implement
the following distributed algorithms:
> * Entropy (
> * Conditional Entropy (
> * Information Gain
> * Information Gain Ratio (
> This issue is at first only for entropy.
> Some questions:
> * In which package do the classes belong. I put them first at 'org.apache.mahout.math.stats',
don't know if this is right, because they are components of information retrieval.
> * Entropy only reads a set of elements. As input i took a sequence file with keys of
type Text and values anyone, because I only work with the keys. Is this the best practise?
> * Is there a generic solution, so that the type of keys can be anything inherited from
> In Hadoop is a TokenCounterMapper, which emits each value with an IntWritable(1). I added
a KeyCounterMapper into 'org.apache.mahout.common.mapreduce' which does the same with the
> Will append my patch soon.
> Regards, Christoph.

This message is automatically generated by JIRA.
For more information on JIRA, see:


View raw message