mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brian Clsrk <brian.cla...@btinternet.com>
Subject Re: Any visualization scripts for graphing DataModel stats?
Date Mon, 28 Mar 2011 11:57:45 GMT
On 28/03/2011 06:51, Jeremy Lewi wrote:
> Another option is Python+MatPlotlib+Numpy. For matlab users, Matplotlib
> provides equivalent plotting routines with nearly identical syntax.
>
> One of the reasons I spent time looking into JPype+Python+Mahout was so
> that I could visualize/inspect the output generated by mahout (e.g
> Vectors stored in sequence files) without having to convert to an
> intermediary format such as csv.
>
> J
> On Sun, 2011-03-27 at 21:37 -0700, Dmitriy Lyubimov wrote:
>> R is good.
>>
>> RapidMiner has tons of visualizations and presumably might be less of
>> a curve than R but it would work modest datasets or subsamples.
>>
>> On Sat, Mar 26, 2011 at 11:59 AM, Dan Brickley<danbri@danbri.org>  wrote:
>>> Hi
>>>
>>> Cutting across from M.I.A. forum -
>>> http://www.manning-sandbox.com/thread.jspa?threadID=42476&tstart=0
>>>
>>> I've loaded a pile of ratings into Mahout and started tweaking a dozen or so
>>> flavours of Recommender with different components, settings. This is great,
>>> I'm getting somewhere and Mahout works.
>>>
>>> However this is a new dataset for me and I've not yet got a good feel for
>>> "what's in there". Since Mahout's datamodel CSV format is a simple and
>>> regular, I suspect various other folk on this list already have utilities
>>> that consume it, and -being lazy- I thought I'd ask before blundering in and
>>> making my own. The kinds of question I have in mind are fairly pedestrian
>>> for now --- what the spread of rating values look like, how many of the
>>> items have, say, 5 or more ratings; how many are super-popular and so on.
>>>
>>> I started toying with [learning] R for this, but before digging further --
>>> am I retreating known ground? Are there any scripts shared already? (I
>>> didn't manage to find much by searching). Does it make sense to have shared
>>> utilities for poking around inside a FileDataModel?
>>>
>>> Thanks for suggestions, pointers etc
>>>
>>> cheers,
>>>
>>> Dan
>>>
>>> ps. started learning R ->
>>>
>>>> ratings<- read.csv('2010ratingtests-datamodel.csv', sep=',')
>>>> names(ratings)<-c("userid","itemid","pref")
>>>> summary(ratings$pref)
>>>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
>>>   1.000   7.000   8.000   8.022  10.000  10.000
>>>> library(lattice)
>>>> histogram(ratings$pref)
>
>
For those of us who still like a good book here are a couple of 
suggestions  for R (even if it's just for under the covers with a torch 
at night).

As was said, there are many books on R out there.  I've looked at most 
(OK, own - I should get out more) and my favourite introduction is

A First Course in Statistical Programming with R  by Braun and Murdoch

It's clearly written with lots of examples and packs a lot into 160 odd 
pages.

On a grander scale there is

The R Book, by Michael Crawley

At 942 pages this is billed as a comprehensive reference manual for R.  
Well-written with lots of examples.  Not cheap but I got a lot of use 
from it when I was starting out.

Regards,

Brian

Mime
View raw message