mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Abramov Pavel <p.abra...@rambler-co.ru>
Subject Re: SSVD fails on seq2sparse output.
Date Mon, 19 Nov 2012 08:36:09 GMT
About 20 000 000 users and 150 000 items. 0,03% non-zeros. 20 features
required.

Pavel

19.11.12 12:31 пользователь "Sebastian Schelter" <ssc@apache.org> написал:

>You need to give much more memory than 200 MB to your mappers. What are
>the dimensions of your input in terms of users and items?
>
>--sebastian
>
>On 19.11.2012 09:28, Abramov Pavel wrote:
>> Thanks for your replies.
>> 
>> 1) 
>>> Can you describe your failure or give us a strack trace?
>> 
>> 
>> Here is job log:
>> 
>> 12/11/19 09:54:07 INFO als.ParallelALSFactorizationJob: Recomputing U
>> (iteration 0/15)
>> …
>> 12/11/19 10:03:31 INFO mapred.JobClient: Job complete:
>> job_201211150152_1671
>> 12/11/19 10:03:31 INFO als.ParallelALSFactorizationJob: Recomputing M
>> (iteration 0/15)
>> …
>> 12/11/19 10:10:04 INFO mapred.JobClient: Task Id :
>> attempt_201211150152_<*ALL*>, Status : FAILED
>> …
>> 12/11/19 10:40:40 INFO mapred.JobClient:     Failed map tasks=1
>> 
>> 
>> 
>> All of these mappers (Recomputing M on 1st iteration) fail with "Java
>>heap
>> space" error.
>> 
>> Here is Hadoop job memory config:
>> 
>> mapred.map.child.java.opts = -Xmx5024m -XX:-UseGCOverheadLimit
>> mapred.child.java.opts = -Xmx200m
>> mapred.job.reuse.jvm.num.tasks = -1
>> 
>> 
>> mapred.cluster.reduce.memory.mb = -1
>> mapred.cluster.map.memory.mb = -1
>> mapred.cluster.max.reduce.memory.mb = -1
>> mapred.job.reduce.memory.mb = -1
>> mapred.job.map.memory.mb = -1
>> mapred.cluster.max.map.memory.mb = -1
>> 
>> Any tweaks possible? Is mapred.map.child.java.opts ok?
>> 
>> 2) As far as I understand ALS can not load U matrix in RAM (20m users)
>> while M is Ok (150k items). Can I split input matrix R (keep all items,
>> split by user) to R1, R2, Rn, then compute M and U1 on R1 (many
>> iterations, then fix M), then compute U2,U3,Un etc using existing M (0,5
>> iteration, do not recompute M)? I want to do this to avoid Memory issues
>> (train on part ).
>> My question is: will all the users from U1, U2, Un "exist" in the same
>> feature space? Can I then compare users from U1 with users from U2 using
>> their features?
>> Any tweak possible here
>> 
>> 3) How to calculate maximum matrix size for given items count and memory
>> limit? For example, my matrix has 20m users, I want to factorize it
>>using
>> 20 features. 20m*20*8 =
>> 3.2 Gb. On the one hand I want to avoid "Java heap space" on the another
>> hand I want to provide my model with maximum training data. I understand
>> that minor changes to parallelALS needed.
>> 
>> Have a nice day!
>> 
>> 
>> Regards, 
>> Pavel
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>


Mime
View raw message