Return-Path: X-Original-To: apmail-mahout-user-archive@www.apache.org Delivered-To: apmail-mahout-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id AA5F1DA02 for ; Sun, 4 Nov 2012 23:15:41 +0000 (UTC) Received: (qmail 24319 invoked by uid 500); 4 Nov 2012 23:15:40 -0000 Delivered-To: apmail-mahout-user-archive@mahout.apache.org Received: (qmail 24230 invoked by uid 500); 4 Nov 2012 23:15:39 -0000 Mailing-List: contact user-help@mahout.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@mahout.apache.org Delivered-To: mailing list user@mahout.apache.org Delivered-To: moderator for user@mahout.apache.org Received: (qmail 52691 invoked by uid 99); 4 Nov 2012 22:45:23 -0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of Arni.Sumarlidason@mdaus.com designates 209.183.244.53 as permitted sender) From: Arni Sumarlidason To: DAN HELM CC: "user@mahout.apache.org" Subject: Re: Mahout: CVB: Error Thread-Topic: Mahout: CVB: Error Thread-Index: Ac26E2i2UT5jInLuSqqLdMRo8dTYBwAMsYgAADBsbYA= Date: Sun, 4 Nov 2012 22:44:38 +0000 Message-ID: <71B7D8BFD2DEA248973F0F178ABFB9502CC162@INFEX01A.mdaus.corp> References: <71B7D8BFD2DEA248973F0F178ABFB9502CB494@INFEX01A.mdaus.corp> <1351989486.52339.YahooMailNeo@web84514.mail.ne1.yahoo.com> In-Reply-To: <1351989486.52339.YahooMailNeo@web84514.mail.ne1.yahoo.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.100.201.30] Content-Type: multipart/alternative; boundary="_000_71B7D8BFD2DEA248973F0F178ABFB9502CC162INFEX01Amdauscorp_" MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.62 on 192.168.2.53 X-Virus-Checked: Checked by ClamAV on apache.org --_000_71B7D8BFD2DEA248973F0F178ABFB9502CC162INFEX01Amdauscorp_ Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Dan, Regarding this thread, http://comments.gmane.org/gmane.comp.apache.mahout.user/13641 Did you publish your modification to the rowid function enabling the splitt= ing of Matrix files? A single pass on my data takes 9 hours. Does this soun= d reasonable to you? please advise. Best, Arni On Nov 3, 2012, at 8:38 PM, DAN HELM > wrote: Arni, I believe you are running with the wrong input for the cvb command: ./mahou= t cvb -i /user/root/sparse-vectors-cvb/docIndex ..... It should be: ./mahout cvb -i /user/root/sparse-vectors-cvb/Matrix ..... docIndex is a file generated by rowid that provides a mapping between the o= riginal sparse vector keys (in Text format) to the Integer keys assigned by= rowid. Dan From: Arni Sumarlidason > To: "user@mahout.apache.org" > Sent: Saturday, November 3, 2012 6:35 PM Subject: Mahout: CVB: Error Good Evening, Thank you for reading.. I am trying to run CVB on mahout 0.8.= .. I have successfully executed the following steps: ./mahout seqdirectory --input /user/root/lda --output text_seq -c UTF-8 -ow= -chunk 8 Resulting in 20 chunk files. ./mahout seq2sparse -i text_seq -o text_vec -wt tf -a org.apache.lucene.ana= lysis.WhitespaceAnalyzer -ow Resulting in 109MB vector, "part-r-00000", "dictionary.file-0", and more. ./mahout rowid -i text_vec/tf-vectors -o sparse-vectors-cvb Resulting in "docIndex" & "matrix" Now... When attempting to run the following command, ./mahout cvb -i /user/root/sparse-vectors-cvb/docIndex -o text_lda -k 100 -= x 20 -dict text_vec/dictionary.file-0 -dt text_cvb_document -mt text_states Resulting in an error: No part files found in model path 'text_states/model= -1' Can someone please point me in the right direction? Best regards, Arni --_000_71B7D8BFD2DEA248973F0F178ABFB9502CC162INFEX01Amdauscorp_--