Return-Path: X-Original-To: apmail-mahout-user-archive@www.apache.org Delivered-To: apmail-mahout-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id DAFB0BC55 for ; Wed, 4 Jan 2012 03:59:04 +0000 (UTC) Received: (qmail 63826 invoked by uid 500); 4 Jan 2012 03:59:02 -0000 Delivered-To: apmail-mahout-user-archive@mahout.apache.org Received: (qmail 62773 invoked by uid 500); 4 Jan 2012 03:58:37 -0000 Mailing-List: contact user-help@mahout.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@mahout.apache.org Delivered-To: mailing list user@mahout.apache.org Received: (qmail 62747 invoked by uid 99); 4 Jan 2012 03:58:31 -0000 Received: from minotaur.apache.org (HELO minotaur.apache.org) (140.211.11.9) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 04 Jan 2012 03:58:31 +0000 Received: from localhost (HELO [10.0.0.12]) (127.0.0.1) (smtp-auth username gsingers, mechanism plain) by minotaur.apache.org (qpsmtpd/0.29) with ESMTP; Wed, 04 Jan 2012 03:58:29 +0000 From: Grant Ingersoll Mime-Version: 1.0 (Apple Message framework v1251.1) Content-Type: multipart/alternative; boundary="Apple-Mail=_140BBC93-896A-4D82-90D2-2A18259D25A5" Subject: Re: SGD and memory Date: Tue, 3 Jan 2012 22:58:29 -0500 In-Reply-To: To: user@mahout.apache.org References: <5F91F538-C239-427E-8C4F-EC2842D4DCC6@apache.org> Message-Id: X-Mailer: Apple Mail (2.1251.1) --Apple-Mail=_140BBC93-896A-4D82-90D2-2A18259D25A5 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=us-ascii On Jan 3, 2012, at 5:59 PM, Ted Dunning wrote: > You math is correct. >=20 > When you say you have 105 features, what do you mean? Sorry, that should have been 105 categories/labels. I'm trying to do = the ASF email equivalent of 20 news groups, but in this case it's 105 = ASF projects. The basic task is to try and predict what project an = email belongs to based on its content. > Are these textual > features? Or what? >=20 > On Tue, Jan 3, 2012 at 2:53 PM, Grant Ingersoll = wrote: >=20 >> I'm trying to run the full ASF email SGD classifier problem and am = facing >> heap size issues. My current setup has 105 features and I am using a >> cardinality of 100K. I'm using the AdaptiveLogisticRegression. I'm >> getting heap errors and they occur when trying to construct the ALR = class >> (i.e. not later during training). >>=20 >> Just trying to check my math on memory: >> ALR comes with 20 CrossFoldLearners (CFL) and each of those comes = with 5 >> OnlineLogisticRegression instances, which each have a DenseMatrix of >> (numFeatures -1) X cardinality, plus some other vectors. >>=20 >> This means, in my case, I have: >> 20 x 5 x (104 x 100,000 x sizeof(double)) =3D 332,800,000,000 bits =3D = ~39 GB >>=20 >> Am I understanding the major parts of memory for ALR correctly? In = other >> words, I need to tone down the number of CFLs in the = TrainASFEmail.java >> file so as to not use 20 CFLs, right? --Apple-Mail=_140BBC93-896A-4D82-90D2-2A18259D25A5--