Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 73034 invoked from network); 28 Oct 2010 02:13:53 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 28 Oct 2010 02:13:53 -0000 Received: (qmail 66118 invoked by uid 500); 28 Oct 2010 02:13:50 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 66052 invoked by uid 500); 28 Oct 2010 02:13:50 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 66044 invoked by uid 99); 28 Oct 2010 02:13:50 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 28 Oct 2010 02:13:50 +0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests=MIME_QP_LONG_LINE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [204.11.223.24] (HELO mail.ova.st) (204.11.223.24) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 28 Oct 2010 02:13:44 +0000 Received: from localhost (localhost.localdomain [127.0.0.1]) by mail.ova.st (Postfix) with ESMTP id DAA67221070 for ; Wed, 27 Oct 2010 19:13:23 -0700 (PDT) X-Virus-Scanned: amavisd-new at ova.st Received: from mail.ova.st ([127.0.0.1]) by localhost (mail.ova.st [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 2q69mt-EE8Yc for ; Wed, 27 Oct 2010 19:13:23 -0700 (PDT) Received: from [192.168.1.100] (unknown [190.245.112.120]) by mail.ova.st (Postfix) with ESMTPSA id E50C822106D for ; Wed, 27 Oct 2010 19:13:22 -0700 (PDT) Subject: Re: Text categorization / classification References: <1855818932.13887.1288210332268.JavaMail.root@srv1.mail.ovast.mlp1.peakwebhosting.com> From: "mvazquez@ova.st" Content-Type: text/plain; charset=us-ascii X-Mailer: iPhone Mail (8B117) In-Reply-To: Message-Id: Date: Wed, 27 Oct 2010 23:13:46 -0300 To: "java-user@lucene.apache.org" Content-Transfer-Encoding: quoted-printable Mime-Version: 1.0 (iPhone Mail 8B117) Thanks a lot! I was reading about Mahout today. I'll try that out. Thanks again Maria Sent from my iPhone On Oct 27, 2010, at 20:59, Lance Norskog wrote: > There are tools for this in the Mahout project. These are oriented > toward large-scale work. >=20 > http://mahout.apache.org >=20 > There is a big learning curve and you have to learn Hadoop somewhat. >=20 > The book 'Collective Intelligence' includes a suite of Python tools > for small-scale experiments. >=20 > On Wed, Oct 27, 2010 at 1:12 PM, Maria Vazquez wrote: >> I need to auto-categorize a large number of documents. They are basically= news articles from major news sources (nytimes, npr, abcnews, etc). >> I'd like to categorize them automatically. Any suggestions? >> Lucene in Action suggests using a set of documents to build category vect= ors and then comparing each document to each of those vectors and get the cl= osest one. >> The approach seems pretty simple (from other papers I read on text catego= rization) but maybe you guys know of something out there that already does t= his using Lucene/Solr. >> Thanks! >> Maria >>=20 >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >> For additional commands, e-mail: java-user-help@lucene.apache.org >>=20 >>=20 >=20 >=20 >=20 > --=20 > Lance Norskog > goksron@gmail.com >=20 > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org >=20 --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org