mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From mohit kothari <>
Subject Storage and Running Time issue running LDA on Cluster
Date Wed, 30 May 2012 05:24:58 GMT

I am trying to run LDA module on my private hadoop cluster consisiting
of 11 datanodes and each of them having approx. 70 GB of storage
remaining and has 4 GB of ram and these dual core machines.

My input dataset consist of around 620k documents with a total size of
2.5GB on which I want to train LDA. There are 2 issues that I am

1) It is taking enormous amount of time to learn, i.e. my 1st
iteration's map job itself is not completing in a day
2) My tasktrackers starts giving Spill Fail error after the disk is
completely full and I have no clue as to what kind of temporary
storage mapper is storing.

Can anyone help me out on what the issue could be and how to rectify it.


View raw message