mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Markus Weimer <>
Subject Re: Multiple data-local passes?
Date Thu, 28 Jan 2010 23:36:58 GMT
Hi Ted,

> If you are running SGD on a single node, just open the HDFS files directly.
> You won't have significant benefit to locality unless the files are
> relatively small.

Good point. However, the applicability of it may depend on the network
topology of the cluster:

Reasonably fast implementations of SGD are bandwidth bound even when
reading from local disk on typical machines. Depending on the network
topology of the cluster, the rack-local bandwidth may be an order of
magnitude higher than the bandwidth you get when reading from a node
in another rack. So I believe there is value in data locality for SGD.

Your point is of course universally true for sequential algorithms
that are CPU-bound such as batch learning schemes.

Take care,


View raw message