mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <>
Subject Re: Help with Mahout Classification
Date Mon, 17 Jan 2011 15:10:51 GMT
On Mon, Jan 17, 2011 at 12:28 AM, Claudia Grieco <>wrote:

> > If you don't have truly massive volumes, then SGD is almost certainly a
> better choice because it is simpler.
> By "simpler" you mean "faster" or "easier to code"?

I mean that the code itself is simpler.

This results in it being faster and easier to code.

> As for the multiple categories problem...I was thinking of returning the
> top N categories to the user, or the ones whose score is more than a certain
> you think it's fine?

top N works.  Threshold can be a little difficult because longer documents
may give larger scores on average and it may not be linear in document

Try it.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message