mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: The default category of a binary classifier
Date Thu, 20 Sep 2012 01:05:11 GMT
With SGD, you can train for an unclassified category, but the system will
always produce scores for all trained categories.  You might interpret
these to decide when there is no decision, but the model itself has no
concept directly of "unclassified".

On Wed, Sep 19, 2012 at 4:55 PM, Lance Norskog <goksron@gmail.com> wrote:

> Shouldn't this be 'unclassified'? I think I have seen data in the
> unclassified buckets with both Bayes and SGD.
>
> ----- Original Message -----
> | From: "Ted Dunning" <ted.dunning@gmail.com>
> | To: user@mahout.apache.org
> | Sent: Wednesday, September 19, 2012 2:54:25 PM
> | Subject: Re: The default category of a binary classifier
> |
> | If a classifier is presented text with no words in common with the
> | training
> | data, it will give you back the most common category in the training
> | data.
> |
> | That said, it is likely to be quite rare when a new document consists
> | *entirely* of new words.  Any overlap with trained vocabulary is
> | likely to
> | over-ride the basic frequencies of different categories.
> |
> | On Wed, Sep 19, 2012 at 1:32 AM, Salman Mahmood
> | <salman@influestor.com>wrote:
> |
> | > First, in mahout, is there a special way to create binary
> | > classifier? for
> | > instance if I am creating classifier for 20 news group data, I will
> | > just
> | > pass 20 as number of categories when creating the training object:
> | >
> | > new AdaptiveLogisticRegression(20, FEATURES, new L1())
> | >
> | > Similarly when creating a binary classifier, I will pass 2 as the
> | > number
> | > of categories and thats it?
> | >
> | > Having established that, what is the default category for a binary
> | > classifier? Lets say I was building a classifier to recognize the
> | > industry
> | > sector for a news item. I have binary models to classify things
> | > into
> | > technology or not technology, banking or not banking, health or not
> | > health
> | > etc. I trained the technology model with technology related news as
> | > positive and all the other news as negative (banking and health).
> | > Now if
> | > the technology model got a news item to classify, from the media
> | > sector
> | > (which it was not trained on), what is the expected behavior? Is it
> | > gonna
> | > say it's a technology news or its not a technology news? any
> | > default
> | > behavior for unseen/untrained news items?
> | > Hope I made the question clear.
> | > Thanks
> |
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message