mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <sro...@gmail.com>
Subject Re: SSVD fails on seq2sparse output.
Date Mon, 19 Nov 2012 10:22:16 GMT
(Yes, it is a Java binary requiring Java 6+. It runs against Hadoop
0.20.x - 2.0.x or work-alikes, or Amazon EMR. The work is in the
reducer in this implementation, so you would need to hand the reducers
extra memory instead of mappers. I think that you can run the whole
20M rows of input in Myrrix, with your given ~5GB per *reducer*, if
you turn up the number of reducers a little bit to
-Dmapred.reduce.tasks=16 or something. It gets away with more because
of a few tricks like using floats, and further partitioning the
matrices.)


Either way the issue with a large U is not in computing *U* but in
computing M from U, since that is when U has to be in memory.

While it's just a gut guess, 20 features sounds quite low relative to
the cardinality of the input. 50-100 is more usual but yes you are
memory constrained. Yes your rule of thumb is a about right for this
implementation as it uses 8-byte doubles. There's other overhead in
the data structure, and other data structures in memory too of course.

Note that in this context you need to constrain Hadoop to not put 2
mappers on one machine, if 2x the heap used doesn't fit in physical
memory of that machine. It will fail or at least you will get bad
swapping: mapred.map.tasks.maximum=1  (Same idea if you were using big
reducers.)

In apps like this, you can squeeze more out of a given amount of RAM
because here most of the RAM is used by long-lived objects, by turning
up the new ratio: -XX:NewRatio=12 or even higher. Otherwise you "run
out" of heap when there is a fair bit of room still available but
reserved for new short-lived objects.

While it won't make much difference, I recommend -XX:+UseParallelOldGC
and do not recommend you disable UseGCOverheadLimit! it should also
have on useful stuff like -XX:+UseCompressedOops by default already
with the latest Java versions and this heap size.


On Mon, Nov 19, 2012 at 8:29 AM, Abramov Pavel <p.abramov@rambler-co.ru> wrote:
>
> Can Myrrix Computation Level run on FreeBSD? Yes, we use hadoop with
> freeBSD )
>
> Regards,
> Pavel
>
>

Mime
View raw message