mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeremy Lewi <jer...@lewi.us>
Subject Re: Any visualization scripts for graphing DataModel stats?
Date Mon, 28 Mar 2011 05:51:06 GMT
Another option is Python+MatPlotlib+Numpy. For matlab users, Matplotlib
provides equivalent plotting routines with nearly identical syntax.

One of the reasons I spent time looking into JPype+Python+Mahout was so
that I could visualize/inspect the output generated by mahout (e.g
Vectors stored in sequence files) without having to convert to an
intermediary format such as csv.

J
On Sun, 2011-03-27 at 21:37 -0700, Dmitriy Lyubimov wrote:
> R is good.
> 
> RapidMiner has tons of visualizations and presumably might be less of
> a curve than R but it would work modest datasets or subsamples.
> 
> On Sat, Mar 26, 2011 at 11:59 AM, Dan Brickley <danbri@danbri.org> wrote:
> > Hi
> >
> > Cutting across from M.I.A. forum -
> > http://www.manning-sandbox.com/thread.jspa?threadID=42476&tstart=0
> >
> > I've loaded a pile of ratings into Mahout and started tweaking a dozen or so
> > flavours of Recommender with different components, settings. This is great,
> > I'm getting somewhere and Mahout works.
> >
> > However this is a new dataset for me and I've not yet got a good feel for
> > "what's in there". Since Mahout's datamodel CSV format is a simple and
> > regular, I suspect various other folk on this list already have utilities
> > that consume it, and -being lazy- I thought I'd ask before blundering in and
> > making my own. The kinds of question I have in mind are fairly pedestrian
> > for now --- what the spread of rating values look like, how many of the
> > items have, say, 5 or more ratings; how many are super-popular and so on.
> >
> > I started toying with [learning] R for this, but before digging further --
> > am I retreating known ground? Are there any scripts shared already? (I
> > didn't manage to find much by searching). Does it make sense to have shared
> > utilities for poking around inside a FileDataModel?
> >
> > Thanks for suggestions, pointers etc
> >
> > cheers,
> >
> > Dan
> >
> > ps. started learning R ->
> >
> >> ratings <- read.csv('2010ratingtests-datamodel.csv', sep=',')
> >> names(ratings) <-c("userid","itemid","pref")
> >> summary(ratings$pref)
> >   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
> >  1.000   7.000   8.000   8.022  10.000  10.000
> >> library(lattice)
> >> histogram(ratings$pref)
> >


Mime
View raw message