mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robin Swezey (JIRA)" <j...@apache.org>
Subject [jira] Issue Comment Edited: (MAHOUT-605) Array returned by classifier.bayes.algorithm.CBayesAlgorithm.classifyDocument is sorted ascendant
Date Sun, 13 Feb 2011 08:40:57 GMT

    [ https://issues.apache.org/jira/browse/MAHOUT-605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12994030#comment-12994030
] 

Robin Swezey edited comment on MAHOUT-605 at 2/13/11 8:39 AM:
--------------------------------------------------------------

Robin, Sean

This is Robin S.

We thank you for your quick answer and your reactivity.

Let me detail why my professor asked this question. In org.apache.mahout.classifier.bayes.algorithm.CBayesAlgorithm,
which we use to output a class scalar or class aray for the most probable prefecture given
a news article, the following code is written under public ClassifierResult[] classifyDocument(String[]
document, Datastore datastore, String defaultCategory, int numResults):

for (String category : categories) {
     double prob = documentWeight(datastore, category, document);

However, we conducted a diff on the documentWeight method called, between the CNB version
and its NB counterpart, and they are identical. This is also the case for public ClassifierResult[]
classifyDocument(String[] document, Datastore datastore, String defaultCategory, int numResults).

The methods to sort documents are identical in the NB and CNB Algorithm classes, the Collections.reverse(result);
line is present in both versions.

Are the weights different types of weights? One of the drivers for NB/CNB training seems to
differ in the case of CNB training (Complementary Bayes Theta Normalizer Driver), is there
some relation to this? Then why the need to sort it ascendant? (which is done in both cases)

This portion of the code looks a little confusing, hence our question.

We thank you again for your reactivity.

Robin S.

      was (Author: mizudera):
    Robin, Sean

This is Robin S.

We thank you for your quick answer and your reactivity.

Let me detail why my professor asked this question. In org.apache.mahout.classifier.bayes.algorithm.CBayesAlgorithm,
which we use to output a class scalar for the most probable prefecture given a news article,
the following code is written under public ClassifierResult[] classifyDocument(String[] document,
Datastore datastore, String defaultCategory, int numResults):

for (String category : categories) {
     double prob = documentWeight(datastore, category, document);

However, we conducted a diff on the documentWeight method called, between the CNB version
and its NB counterpart, and they are identical. This is also the case for public ClassifierResult[]
classifyDocument(String[] document, Datastore datastore, String defaultCategory, int numResults).

The methods to sort documents are identical in the NB and CNB Algorithm classes, the Collections.reverse(result);
line is present in both versions.

Are the weights different types of weights? One of the drivers for NB/CNB training seems to
differ in the case of CNB training (Complementary Bayes Theta Normalizer Driver), is there
some relation to this? Then why the need to sort it ascendant? (which is done in both cases)

This portion of the code looks a little confusing, hence our question.

We thank you again for your reactivity.

Robin S.
  
> Array returned by classifier.bayes.algorithm.CBayesAlgorithm.classifyDocument is sorted
ascendant
> -------------------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-605
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-605
>             Project: Mahout
>          Issue Type: Bug
>          Components: Classification
>    Affects Versions: 0.4
>         Environment: Linux
>            Reporter: Robin Swezey
>            Assignee: Robin Anil
>            Priority: Minor
>              Labels: bayesian, classification
>             Fix For: 0.5
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The array returned for a n-best call to classifyDocument is sorted ascendant instead
of descendant. 
> Ex:
> {quote}
> 47-best: [ClassifierResult\{category='香川県', score=32.28281232047167\},
> ClassifierResult\{category='宮崎県', score=32.28969992600906\}, ......,
> ClassifierResult\{category='愛知県', score=32.487981016587796\},
> ClassifierResult\{category='東京都', score=32.49189358054859\},
> ClassifierResult\{category='北海道', score=32.49811200756193\}]
> {quote}
> (classification of documents for Japanese prefectures)
> Inside the classifyDocument method, just before the return statement we found this line:
> {quote}
> Collections.reverse(result);
> {quote}
> Is this a mistake or a design choice? (we are not sure, hence the "Minor" priority)

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

Mime
View raw message