mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From james q <>
Subject Memory Issue with KMeans clustering
Date Fri, 04 Feb 2011 05:05:22 GMT

New user to mahout and hadoop here. Isabel Drost suggested to a colleague I
should post to the mahout user list, as I am having some general
difficulties with memory consumption and KMeans clustering.

So a general question first and foremost: what determines how much memory
does a map task consume during a KMeans clustering job? Increasing the
number of map tasks by adjusting dfs.block.size and mapred.max.split.size
doesn't seem to make the map task consume less memory. Or at least not a
very noticeable amount. I figured if there are more map tasks, each
individual map task evaluates less input keys and hence there would be less
memory consumption. Is there any way to predict memory usage of map tasks in

The cluster I am running consists of 10 machines, each with 8 cores and 68G
of ram. I've configured the cluster to have each machine, at maximum, run 7
map or reduce tasks. I set the map and reduce tasks to have virtually no
limit on memory consumption ... so with 7 processes each, at around 9 - 10G
per process, the machines will crap out. I can reduce the number of map
tasks per machine, but something tells me that that level of memory
consumption is wrong.

If any more information is needed to help debug this, please let me know!

-- james

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message