mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Schilling <ch...@cellixis.com>
Subject Re: Strange results running SGD TrainNewsGroups example
Date Mon, 20 Dec 2010 22:11:00 GMT
Hey Ted,

Just FYI, 

I changed the Weight subclass of the ModelDissector to sort by true value (rather than absolute
value) and reran over the 20 newsgroups data.  Here are the results of the dissector function:

body=rt	0.042	comp.sys.mac.hardware
body=computer	0.039	sci.electronics
body=seem	0.035	talk.religion.misc
body=mike	0.035	misc.forsale
body=windows	0.034	misc.forsale
body=just	0.032	sci.crypt
body=supports	0.032	talk.politics.mideast
body=x	0.032	talk.religion.misc
body=do	0.029	rec.motorcycles
body=university	0.028	comp.sys.mac.hardware
body=slagle	0.028	rec.sport.hockey

I prefer the results from MIA :)  Anyway, I know you are busy.  If there is anything I can
do to help, let me know.  Still getting familiar with the code, but could help out with some
guidance.

Thanks a lot,
Chris

On Dec 17, 2010, at 7:37 PM, Ted Dunning wrote:

> Hard to say what changed just off hand.  I was tweaking the SGD code pretty
> regularly as I learned from the results users were getting.  I should look
> at the history to review what happened... some changes may not have been
> good.
> 
> On Fri, Dec 17, 2010 at 5:28 PM, Chris Schilling
> <chris.schilling@gmail.com>wrote:
> 
>> Thanks for the answers Ted.  Ill take a look inside the dissector.  I was
>> just wondering because the results are quite a bit different from whats in
>> the book - Listing 15.9.  Here are those results (where words have weights >
>> 1).
>> 
>> body=space 2.1 sci.space
>> body=sale 1.9 misc.forsale
>> body=car 1.9 rec.autos
>> body=windows 1.8 comp.os.ms-windows.misc
>> body=mac 1.7 comp.sys.mac.hardware
>> body=bike 1.7 rec.motorcycles
>> body=apple 1.5 comp.sys.mac.hardware
>> body=gun 1.5 talk.politics.guns
>> body=baseball 1.5 rec.sport.baseball
>> body=graphics 1.5 comp.graphics
>> 
>> 
>> I guess I mostly want to understand what changed.  Again, Ill take a look
>> at the dissector, because the results of the training look pretty good.
>> 


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message