mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brian Clsrk <>
Subject Re: Any visualization scripts for graphing DataModel stats?
Date Mon, 28 Mar 2011 11:57:45 GMT
On 28/03/2011 06:51, Jeremy Lewi wrote:
> Another option is Python+MatPlotlib+Numpy. For matlab users, Matplotlib
> provides equivalent plotting routines with nearly identical syntax.
> One of the reasons I spent time looking into JPype+Python+Mahout was so
> that I could visualize/inspect the output generated by mahout (e.g
> Vectors stored in sequence files) without having to convert to an
> intermediary format such as csv.
> J
> On Sun, 2011-03-27 at 21:37 -0700, Dmitriy Lyubimov wrote:
>> R is good.
>> RapidMiner has tons of visualizations and presumably might be less of
>> a curve than R but it would work modest datasets or subsamples.
>> On Sat, Mar 26, 2011 at 11:59 AM, Dan Brickley<>  wrote:
>>> Hi
>>> Cutting across from M.I.A. forum -
>>> I've loaded a pile of ratings into Mahout and started tweaking a dozen or so
>>> flavours of Recommender with different components, settings. This is great,
>>> I'm getting somewhere and Mahout works.
>>> However this is a new dataset for me and I've not yet got a good feel for
>>> "what's in there". Since Mahout's datamodel CSV format is a simple and
>>> regular, I suspect various other folk on this list already have utilities
>>> that consume it, and -being lazy- I thought I'd ask before blundering in and
>>> making my own. The kinds of question I have in mind are fairly pedestrian
>>> for now --- what the spread of rating values look like, how many of the
>>> items have, say, 5 or more ratings; how many are super-popular and so on.
>>> I started toying with [learning] R for this, but before digging further --
>>> am I retreating known ground? Are there any scripts shared already? (I
>>> didn't manage to find much by searching). Does it make sense to have shared
>>> utilities for poking around inside a FileDataModel?
>>> Thanks for suggestions, pointers etc
>>> cheers,
>>> Dan
>>> ps. started learning R ->
>>>> ratings<- read.csv('2010ratingtests-datamodel.csv', sep=',')
>>>> names(ratings)<-c("userid","itemid","pref")
>>>> summary(ratings$pref)
>>>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
>>>   1.000   7.000   8.000   8.022  10.000  10.000
>>>> library(lattice)
>>>> histogram(ratings$pref)
For those of us who still like a good book here are a couple of 
suggestions  for R (even if it's just for under the covers with a torch 
at night).

As was said, there are many books on R out there.  I've looked at most 
(OK, own - I should get out more) and my favourite introduction is

A First Course in Statistical Programming with R  by Braun and Murdoch

It's clearly written with lots of examples and packs a lot into 160 odd 

On a grander scale there is

The R Book, by Michael Crawley

At 942 pages this is billed as a comprehensive reference manual for R.  
Well-written with lots of examples.  Not cheap but I got a lot of use 
from it when I was starting out.



View raw message