mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <gsing...@apache.org>
Subject Re: SGD and memory
Date Wed, 04 Jan 2012 03:58:29 GMT

On Jan 3, 2012, at 5:59 PM, Ted Dunning wrote:

> You math is correct.
> 
> When you say you have 105 features, what do you mean?

Sorry, that should have been 105 categories/labels.  I'm trying to do the ASF email equivalent
of 20 news groups, but  in this case it's 105 ASF projects.  The basic task is to try and
predict what project an email belongs to based on its content.

>  Are these textual
> features?  Or what?
> 
> On Tue, Jan 3, 2012 at 2:53 PM, Grant Ingersoll <gsingers@apache.org> wrote:
> 
>> I'm trying to run the full ASF email SGD classifier problem and am facing
>> heap size issues.  My current setup has 105 features and I am using a
>> cardinality of 100K.  I'm using the AdaptiveLogisticRegression.  I'm
>> getting heap errors and they occur when trying to construct the ALR class
>> (i.e. not later during training).
>> 
>> Just trying to check my math on memory:
>> ALR comes with 20 CrossFoldLearners (CFL) and each of those comes with 5
>> OnlineLogisticRegression instances, which each have a DenseMatrix of
>> (numFeatures -1) X cardinality, plus some other vectors.
>> 
>> This means, in my case, I have:
>> 20 x 5 x (104 x 100,000 x sizeof(double)) = 332,800,000,000 bits = ~39 GB
>> 
>> Am I understanding the major parts of memory for ALR correctly?  In other
>> words, I need to tone down the number of CFLs in the TrainASFEmail.java
>> file so as to not use 20 CFLs, right?



Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message