mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sandra Clover" <sclo...@consultant.com>
Subject Re: Classify() method results anomoly - help!
Date Wed, 30 Sep 2009 13:13:27 GMT
Hi Robin,     Thanks for the reply & for updating the documentation &
your advice. I'll try the trunk version. To answer your question I am
using Mahout version 0.1 & Hadoop 0.19.2. Hope this helps... Thanks
again, Robin Sandra.

  ----- Original Message -----
  From: "Robin Anil"
  To: mahout-user@lucene.apache.org
  Subject: Re: Classify() method results anomoly - help!
  Date: Wed, 30 Sep 2009 18:08:05 +0530


  Hi Sandra, those scores are indicative of the relative score not the
  probability, Thank for bringing this to our notice, I will fix the
  documentation, you may try the trunk and see if the former error is
  coming. Also
  could you tell me the version of hadoop you are using.



  On Wed, Sep 30, 2009 at 5:30 PM, Sandra Clover wrote:

  > Thanks Grant, I'll look into that. I've been having a look at the
  > numbers returned from the getScore() method also. I have noticed a
  range
  > from 0 to around 20000.243434+ with numbers in between like:
  > 1659.930763537123 According to the API documentation for this
  method:
  > "The label and the associated score(Usually probabilty)". This does
  not
  > look like probability to me. I was kind of expecting an answer
  between 0
  > and 1 or 0 and 100 or something like that. Are these results
  typical or
  > indicative of some sort of bug? Once again, comments/suggestions
  > appreciated.Sandra.
  >
  >
  >
  > ----- Original Message -----
  > From: "Grant Ingersoll"
  > To: mahout-user@lucene.apache.org
  > Subject: Re: Classify() method results anomoly - help!
  > Date: Tue, 29 Sep 2009 16:02:46 -0400
  >
  >
  >
  > On Sep 29, 2009, at 8:47 AM, Sandra Clover wrote:
  >
  > > Hi, I'm using Mahout 0.1 for document classification (using the
  > > distributed Bayesian Network) and I'm getting some answers back.
  I
  > > have noticed 1 thing that is really bugging me. I'm wondering can
  > you
  > > help please:-
  > > Problem: Concernign the Classify() method there are 2
  constructors
  > in
  > > the API. The first one returns just one answer (according to the
  > API it
  > > returns: "the single best category"). The second constructor says
  > that
  > > it: "return the top numResults, ranked by score" My problem is
  that
  > I
  > > have compared and contrasted the results in both techniques. I
  have
  > > noticed that the single best category does not appear at *all* in
  > the
  > > range of categories given by the second contructor! Strange no? I
  > would
  > > of expected that it should come top of the list. I have gone to a
  > value
  > > of 20 deep in the numResults level and have not even see in the
  > best
  > > category. Has anyone encountered this before? I would appreciate
  > any
  > > comments/suggestions/user-experience that you may like to share.
  > Thanks,
  > > Sandra.
  > >
  >
  > That sounds like a bug. Can you try out the trunk version of
  > Mahout and see if it is still there? A lot of the classification
  > stuff has been reworked recently (I'm not even sure at the moment
  > that those two classify methods are even still in the code!)
  >
  > --
  > An Excellent Credit Score is 750
  > See Yours in Just 2 Easy Steps!
  >
  >

-- 
An Excellent Credit Score is 750 
See Yours in Just 2 Easy Steps!


Mime
  • Unnamed multipart/alternative (inline, 7-Bit, 0 bytes)
View raw message