mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Björn Jacobs" <>
Subject PFPGrowth on cluster does not distribute work load equally on nodes
Date Wed, 16 Jun 2010 15:26:32 GMT
Hallo everyone!

I am trying to get used to the PFPGrowth in the Mahout packages. I am planning to adapt this
code to be able to run a parallelized subgroup discovery. This is btw the aim of my bachelor
thesis, which I am currently writing.

I'm having the problem that the algorithm does not distribute the work load equally on the
nodes in my cluster. I have 10 nodes and I set the as well as the mapred.reduce.tasks

My problem is, that the "PFP Growth Driver running over input/test002/sortedoutput"-Job did
the following:

Node 0 got nearly 100% of the work (finished in 20 minutes)
Node 1-3 got a very small piece (finished in less than 10 seconds)
Node 4-14 got nothing and finished execution immediately

This way one node had to do all the work while the others had nothing to do and the job took
really long to finish... that's not parallel.

Is this a bug or do I have to configure something to get this working?
Thanks a lot!

Björn Jacobs
GMX DSL: Internet-, Telefon- und Handy-Flat ab 19,99 EUR/mtl.  
Bis zu 150 EUR Startguthaben inklusive!

View raw message