mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Verachten Bruno <>
Subject RE: Classification: using the Java API always returns the same category
Date Mon, 16 Apr 2012 09:08:03 GMT

> If tweaking the algorithm and parameters do not help, I suggest taking a long hard look
in your data.
> a. How many examples do you have of category3? Are they the vast majority?
It's the smaller set.

> b. Does category3 data overwhelm the other data? Recently, I tried to classify texts
into 20 categories.
> The text documents from several categories were significantly  longer
> (x100) than the other categories,
> so they dominated the classifier.
I see. I don't think that's what happening with my data. The text data is always in the same
size range (between 100-300 characters).

Well, I summarized the test results with a bigger test set, and got 84% success, which seems
quite good to me:
Found 12800 good guesses
Found 2430 bad guesses
Found 84 % good guesses

So... the "always returns the same category" was just bad luck when choosing my sample.
Sorry for the fuss.

Kind regards,
Bruno Verachten

Ce message et les pièces jointes sont confidentiels et réservés à l'usage exclusif de
ses destinataires. Il peut également être protégé par le secret professionnel. Si vous
recevez ce message par erreur, merci d'en avertir immédiatement l'expéditeur et de le détruire.
L'intégrité du message ne pouvant être assurée sur Internet, la responsabilité d'Atos
ne pourra être recherchée quant au contenu de ce message. Bien que les meilleurs efforts
soient faits pour maintenir cette transmission exempte de tout virus, l'expéditeur ne donne
aucune garantie à cet égard et sa responsabilité ne saurait être recherchée pour tout
dommage résultant d'un virus transmis.

This e-mail and the documents attached are confidential and intended solely for the addressee;
it may also be privileged. If you receive this e-mail in error, please notify the sender immediately
and destroy it. As its integrity cannot be secured on the Internet, the Atos liability cannot
be triggered for the message content. Although the sender endeavours to maintain a computer
virus-free network, the sender does not warrant that this transmission is virus-free and will
not be liable for any damages resulting from any virus transmitted.
View raw message