mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: Help with Mahout Classification
Date Mon, 17 Jan 2011 15:10:51 GMT
On Mon, Jan 17, 2011 at 12:28 AM, Claudia Grieco <grieco@crmpa.unisa.it>wrote:

> > If you don't have truly massive volumes, then SGD is almost certainly a
> better choice because it is simpler.
>
> By "simpler" you mean "faster" or "easier to code"?
>

I mean that the code itself is simpler.

This results in it being faster and easier to code.


> As for the multiple categories problem...I was thinking of returning the
> top N categories to the user, or the ones whose score is more than a certain
> threshold...do you think it's fine?
>

top N works.  Threshold can be a little difficult because longer documents
may give larger scores on average and it may not be linear in document
length.

Try it.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message