mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: Querry regarding use of classifier in Mahout
Date Wed, 20 Oct 2010 22:50:58 GMT
If this is testing on held-out data, then this is a pretty respectable
result for an untuned system.

Are these results on held-out data?

On Wed, Oct 20, 2010 at 6:35 AM, JAGANADH G <jaganadhg@gmail.com> wrote:

> @robin and @ted
>
> I tested it in a different way.
> I created a program to convert input text to Mahout training format. The
> program will remove all the punctuation and junk charters from a text,
> removes any numbers like year date exists there. Then it converts the text
> to lowercase. After that the text will be prepared in to a mahout training
> format (label"\t" text"\n").
>
> After training with CBayesClasssifier I tested it.
> The result is
> 1) with ng=1 -a=1.0
> Correctly calssified instances = 52.5%
> Incorrect = 47.5%
> 2) with ng=2 -a=1.0
> Correctly calssified instances = 74.5%
> Incorrect = 25.5%
>
> Now I have question .
> 1) The output of preparetwentynesgroup creates a text from where all the
> stop words are removed. Also the text will be just a simple collection of
> words . So when we apply generateNGramsWithoutLabel() will it it generate
> NGrams correctly (Means accuracy of ngram?)
> --
> **********************************
> JAGANADH G
> http://jaganadhg.freeflux.net/blog
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message