mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alex Kozlov <ale...@cloudera.com>
Subject Re: Frequent itemset mining
Date Wed, 06 Jun 2012 02:19:35 GMT
The documentation says:

Running parallel FPGrowth is as easy as adding changing the flag -method
mapreduce and adding the number of groups parameter e.g. -g 20 for 20
groups. First, let's run the above sample test in map-reduce mode:

bin/mahout fpg \
     -i core/src/test/resources/retail.dat \
     -o patterns \
     -k 50 \
     -method mapreduce \
     -regex '[\ ]' \
     -s 2

 The above test took 102 seconds on dual-core laptop, v.s. 609 seconds in
the sequential mode, (with 5 gigs of ram allocated). In a separate test,
the first 1000 lines of retail.dat took 20 seconds in map/reduce v.s. 30
seconds in sequential mode.

Running the example above I get times more like hours (both sequential and
mapreduce methods) on a 48GB boxes.  Am I doing something wrong?  Should it
be minutes instead of seconds?
--
Alex K

On Mon, Dec 5, 2011 at 12:50 PM, Isabel Drost <isabel@apache.org> wrote:

> On 02.12.2011 Tom Pierce wrote:
> > These programs are actually exposed though the main mahout program; if
> you
> > run:
> >
> > $MAHOUT_HOME/bin/mahout fpg
> >
> > it will run the Frequent Pattern Growth algorithm (aka frequent itemset
> > mining).
>
> Also there is quite some documentation on the wiki:
>
> https://cwiki.apache.org/MAHOUT/parallel-frequent-pattern-mining.html(also
> includes a link to the original research publication).
>
> Isabel
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message