mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: Memory Issue with KMeans clustering
Date Fri, 04 Feb 2011 06:55:04 GMT
How many clusters?

How large is the dimension of your input data?

On Thu, Feb 3, 2011 at 9:05 PM, james q <james.quacinella@gmail.com> wrote:

> Hello,
>
> New user to mahout and hadoop here. Isabel Drost suggested to a colleague I
> should post to the mahout user list, as I am having some general
> difficulties with memory consumption and KMeans clustering.
>
> So a general question first and foremost: what determines how much memory
> does a map task consume during a KMeans clustering job? Increasing the
> number of map tasks by adjusting dfs.block.size and mapred.max.split.size
> doesn't seem to make the map task consume less memory. Or at least not a
> very noticeable amount. I figured if there are more map tasks, each
> individual map task evaluates less input keys and hence there would be less
> memory consumption. Is there any way to predict memory usage of map tasks
> in
> KMeans?
>
> The cluster I am running consists of 10 machines, each with 8 cores and 68G
> of ram. I've configured the cluster to have each machine, at maximum, run 7
> map or reduce tasks. I set the map and reduce tasks to have virtually no
> limit on memory consumption ... so with 7 processes each, at around 9 - 10G
> per process, the machines will crap out. I can reduce the number of map
> tasks per machine, but something tells me that that level of memory
> consumption is wrong.
>
> If any more information is needed to help debug this, please let me know!
> Thanks!
>
> -- james
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message