mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lance Norskog <goks...@gmail.com>
Subject Re: SGD and memory
Date Tue, 03 Jan 2012 23:30:46 GMT
Does these algorithms have good locality? For doing giant online
computations it might be worth storing these in memory-mapped files.
Or, give up and get the M/R SGD code in.

On Tue, Jan 3, 2012 at 2:59 PM, Ted Dunning <ted.dunning@gmail.com> wrote:
> You math is correct.
>
> When you say you have 105 features, what do you mean?  Are these textual
> features?  Or what?
>
> On Tue, Jan 3, 2012 at 2:53 PM, Grant Ingersoll <gsingers@apache.org> wrote:
>
>> I'm trying to run the full ASF email SGD classifier problem and am facing
>> heap size issues.  My current setup has 105 features and I am using a
>> cardinality of 100K.  I'm using the AdaptiveLogisticRegression.  I'm
>> getting heap errors and they occur when trying to construct the ALR class
>> (i.e. not later during training).
>>
>> Just trying to check my math on memory:
>> ALR comes with 20 CrossFoldLearners (CFL) and each of those comes with 5
>> OnlineLogisticRegression instances, which each have a DenseMatrix of
>> (numFeatures -1) X cardinality, plus some other vectors.
>>
>> This means, in my case, I have:
>> 20 x 5 x (104 x 100,000 x sizeof(double)) = 332,800,000,000 bits = ~39 GB
>>
>> Am I understanding the major parts of memory for ALR correctly?  In other
>> words, I need to tone down the number of CFLs in the TrainASFEmail.java
>> file so as to not use 20 CFLs, right?



-- 
Lance Norskog
goksron@gmail.com

Mime
View raw message