mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robin Anil <robin.a...@gmail.com>
Subject Re: Classify() method results anomoly - help!
Date Wed, 30 Sep 2009 12:38:05 GMT
Hi Sandra, those scores are indicative of the relative score not the
probability, Thank for bringing this to our notice, I will fix the
documentation, you may try the trunk and see if the former error is
coming. Also
could you tell me the version of hadoop you are using.



On Wed, Sep 30, 2009 at 5:30 PM, Sandra Clover <sclover@consultant.com>wrote:

> Thanks Grant,    I'll look into that. I've been having a look at the
> numbers returned from the getScore() method also. I have noticed a range
> from 0 to around 20000.243434+  with numbers in between like:
> 1659.930763537123    According to the API documentation for this method:
> "The label and the associated score(Usually probabilty)". This does not
> look like probability to me. I was kind of expecting an answer between 0
> and 1 or 0 and 100 or something like that. Are these results typical or
> indicative of some sort of bug? Once again, comments/suggestions
> appreciated.Sandra.
>
>
>
>  ----- Original Message -----
>  From: "Grant Ingersoll"
>  To: mahout-user@lucene.apache.org
>  Subject: Re: Classify() method results anomoly - help!
>  Date: Tue, 29 Sep 2009 16:02:46 -0400
>
>
>
>  On Sep 29, 2009, at 8:47 AM, Sandra Clover wrote:
>
>  > Hi, I'm using Mahout 0.1 for document classification (using the
>  > distributed Bayesian Network) and I'm getting some answers back. I
>  > have noticed 1 thing that is really bugging me. I'm wondering can
>  you
>  > help please:-
>  > Problem: Concernign the Classify() method there are 2 constructors
>  in
>  > the API. The first one returns just one answer (according to the
>  API it
>  > returns: "the single best category"). The second constructor says
>  that
>  > it: "return the top numResults, ranked by score" My problem is that
>  I
>  > have compared and contrasted the results in both techniques. I have
>  > noticed that the single best category does not appear at *all* in
>  the
>  > range of categories given by the second contructor! Strange no? I
>  would
>  > of expected that it should come top of the list. I have gone to a
>  value
>  > of 20 deep in the numResults level and have not even see in the
>  best
>  > category. Has anyone encountered this before? I would appreciate
>  any
>  > comments/suggestions/user-experience that you may like to share.
>  Thanks,
>  > Sandra.
>  >
>
>  That sounds like a bug. Can you try out the trunk version of
>  Mahout and see if it is still there? A lot of the classification
>  stuff has been reworked recently (I'm not even sure at the moment
>  that those two classify methods are even still in the code!)
>
> --
> An Excellent Credit Score is 750
> See Yours in Just 2 Easy Steps!
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message