mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Isabel Drost <>
Subject Re: Load Dataset and Instances from database
Date Fri, 25 Nov 2011 12:46:02 GMT
On 24.11.2011 Ted Dunning wrote:
> Actually, one of the most reliable ways to kill a database is to use it as
> input or output for even a small Hadoop cluster.  Having hundreds of
> processes all open connections and read at once is fairly abusive.

Though that does not mean that data cannot by synced to hdfs before being used 
in a map/reduce job. Tools like sqoop help with that.


View raw message