mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <sro...@gmail.com>
Subject Re: mapreduce memory issues
Date Wed, 05 May 2010 18:11:57 GMT
I think it's UserVectorToCooccurrenceMapper, which keeps a local count
of how many times each item has been seen. On a small cluster with a
few mappers, which see all items, you'd have a count for each item.
That's still not terrible, but, could take up a fair bit of memory.

One easy solution is to cap its size and throw out low-count entries sometimes.

Just to confirm this is the issue, you could hack in this line:

  private void countSeen(Vector userVector) {
    if (indexCounts.size() > 1000000) return;
    ...

That's not a real solution, but an easy way you could perhaps test for
everyone whether that's the problem. If that's it i can solve this in
a more robust way.

On Wed, May 5, 2010 at 7:03 PM, Tamas Jambor <jamborta@googlemail.com> wrote:
> Hi,
>
> I came across a new problem with the mapreduce implementation. I am trying
> to optimize the cluster for this implemetation. But the problem is that in
> order to run
> RecommenderJob-UserVectorToCooccurrenceMapper-UserVectorToCooccurrenceReducer,
> I need to set -Xmx2048m, with a smaller value it fails the job. how come it
> needs so much memory? maybe there is a memory leak here? generally it is
> suggested to set -Xmx512m
>
> the other problem setting it so high is that I have to reduce the number
> map/reduce jobs per node, otherwise the next job brings the whole cluster
> down.
>
> Tamas
>
>

Mime
View raw message