Return-Path: Delivered-To: apmail-mahout-user-archive@www.apache.org Received: (qmail 85374 invoked from network); 1 Mar 2011 06:51:42 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 1 Mar 2011 06:51:42 -0000 Received: (qmail 38866 invoked by uid 500); 1 Mar 2011 06:51:42 -0000 Delivered-To: apmail-mahout-user-archive@mahout.apache.org Received: (qmail 38577 invoked by uid 500); 1 Mar 2011 06:51:38 -0000 Mailing-List: contact user-help@mahout.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@mahout.apache.org Delivered-To: mailing list user@mahout.apache.org Received: (qmail 38568 invoked by uid 99); 1 Mar 2011 06:51:37 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 01 Mar 2011 06:51:37 +0000 X-ASF-Spam-Status: No, hits=4.7 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,HTML_FONT_FACE_BAD,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of manoj1987@gmail.com designates 209.85.220.170 as permitted sender) Received: from [209.85.220.170] (HELO mail-vx0-f170.google.com) (209.85.220.170) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 01 Mar 2011 06:51:33 +0000 Received: by vxb39 with SMTP id 39so7291485vxb.1 for ; Mon, 28 Feb 2011 22:51:12 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=fWxKBfpVPCUoiNXCFIsPQQNSkHOOncM5kYZuOvz7QQQ=; b=bSNqtRL1NM8+HEnPbj0b7cZ4u/Flj3BAH+c0HVMJJ4lm+2MDcZdS69yFIg+8P3l3jJ hh6I7tq2qPLtn73xyhF+Dm3Qg5IzxPS2o6iNOPNmZ2nW54Ekj6YDlVbo0QQGLPNkF6kj /iRysmOCDALcdvm/OjuqALMfDS/eut8TS1tVA= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=GLBkgTb82rS3dlu1RwGfk1J1muoyku8/6hhE1pqH3a2p01KCA/6FuH6nWF3miKnudh RikZQguuQ3ORgugHekMtRoLg2Y4H8O96DzupyiQbDMcegrYnWr4Q6S1vRWkWqG32K+LO iTJ+8Dkee5ObuGNU97HLafmJCzrgboBfnYRgg= MIME-Version: 1.0 Received: by 10.52.163.36 with SMTP id yf4mr10677860vdb.279.1298962271851; Mon, 28 Feb 2011 22:51:11 -0800 (PST) Received: by 10.52.168.231 with HTTP; Mon, 28 Feb 2011 22:51:11 -0800 (PST) In-Reply-To: References: <99CF5A2B2A1D9542A589C5F5EBD3DA0304004FFA33@rock.narus.com> <99CF5A2B2A1D9542A589C5F5EBD3DA0304004FFA46@rock.narus.com> Date: Tue, 1 Mar 2011 01:51:11 -0500 Message-ID: Subject: Re: LDA Mahout From: Manoj Kumar To: user@mahout.apache.org Content-Type: multipart/alternative; boundary=bcaec53f93fdca5619049d6638aa --bcaec53f93fdca5619049d6638aa Content-Type: text/plain; charset=ISO-8859-1 Hi Jeff Eastman, Is there any options to perform stopwords removal while performing LDA in mahout or while creating sequence files from the corpus? Kindly reply. Thanks & Regards, Manoj Kumar.R.K Graduate Student, MS Computer Science University at Buffalo Buffalo, New York (413) 461-8938|www.rkmanojkumar.co.nr On Mon, Feb 28, 2011 at 1:06 PM, Manoj Kumar wrote: > Hi Jeff Eastman, > > Thanks a lot. I ll look into it and will contact you in case of any help. > > Thanks & Regards, > Manoj Kumar.R.K > Graduate Student, MS Computer Science > University at Buffalo > Buffalo, New York > (413) 461-8938|www.rkmanojkumar.co.nr > > > > On Mon, Feb 28, 2011 at 12:48 PM, Jeff Eastman wrote: > >> Look at examples/bin/build-reuters.sh for some examples. They are all from >> the command line but illustrate the best way to do what you are attempting. >> https://cwiki.apache.org/confluence/display/MAHOUT/K-Means+Clusteringalso has some example code for doing text processing. >> >> -----Original Message----- >> From: Manoj Kumar [mailto:manoj1987@gmail.com] >> Sent: Monday, February 28, 2011 9:28 AM >> To: user@mahout.apache.org >> Subject: Re: LDA Mahout >> >> Hi Jeff Eastman, >> Thanks for your reply. I looked into the LDADriver Class. But am not sure >> as >> how to convert my text documents to Sequence Files and then to >> SparseVectors >> for giving input to LDADriver. Can you please help me in this conversion. >> ALso, is it enough to just call the run method in LDADriver Class with >> appropriate inputs for modeling the topics? >> >> Thanks & Regards, >> Manoj Kumar.R.K >> Graduate Student, MS Computer Science >> University at Buffalo >> Buffalo, New York >> (413) 461-8938|www.rkmanojkumar.co.nr >> >> >> >> On Mon, Feb 28, 2011 at 12:23 PM, Jeff Eastman >> wrote: >> >> > Have you looked at the Java classes that implement LDA? The private >> > LDADriver.run() method should be made public, but this can be called >> from >> > Java in Eclipse (if that is what you mean by "using Eclipse"). You could >> > also look at the wiki for information on running LDA ( >> > >> https://cwiki.apache.org/confluence/display/MAHOUT/Latent+Dirichlet+Allocation >> > ). >> > >> > -----Original Message----- >> > From: Manoj Kumar [mailto:manoj1987@gmail.com] >> > Sent: Monday, February 28, 2011 9:09 AM >> > To: user@mahout.apache.org >> > Subject: LDA Mahout >> > >> > Hi, >> > >> > I am doing a project which requires topic modeling of documents using >> LDA. >> > I >> > am planning to implement this using Mahout LDA. I am not able to get any >> > sample codes for implementing this using Eclipse. Only command line >> options >> > where available. Kindly suggest me some tutorial or please provide me >> some >> > basic code for implementing LDA. Kindly reply. >> > >> > Thanks & Regards, >> > Manoj Kumar.R.K >> > Graduate Student, MS Computer Science >> > University at Buffalo >> > Buffalo, New York >> > (413) 461-8938|www.rkmanojkumar.co.nr >> > >> > > --bcaec53f93fdca5619049d6638aa--