mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <>
Subject SGD and memory
Date Tue, 03 Jan 2012 22:53:03 GMT
I'm trying to run the full ASF email SGD classifier problem and am facing heap size issues.
 My current setup has 105 features and I am using a cardinality of 100K.  I'm using the AdaptiveLogisticRegression.
 I'm getting heap errors and they occur when trying to construct the ALR class (i.e. not later
during training).

Just trying to check my math on memory:
ALR comes with 20 CrossFoldLearners (CFL) and each of those comes with 5 OnlineLogisticRegression
instances, which each have a DenseMatrix of (numFeatures -1) X cardinality, plus some other

This means, in my case, I have:
20 x 5 x (104 x 100,000 x sizeof(double)) = 332,800,000,000 bits = ~39 GB

Am I understanding the major parts of memory for ALR correctly?  In other words, I need to
tone down the number of CFLs in the file so as to not use 20 CFLs, right?
View raw message