Return-Path: Delivered-To: apmail-lucene-mahout-user-archive@minotaur.apache.org Received: (qmail 91200 invoked from network); 23 Jun 2009 23:05:08 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 23 Jun 2009 23:05:08 -0000 Received: (qmail 62932 invoked by uid 500); 23 Jun 2009 23:05:18 -0000 Delivered-To: apmail-lucene-mahout-user-archive@lucene.apache.org Received: (qmail 62884 invoked by uid 500); 23 Jun 2009 23:05:18 -0000 Mailing-List: contact mahout-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mahout-user@lucene.apache.org Delivered-To: mailing list mahout-user@lucene.apache.org Received: (qmail 62874 invoked by uid 99); 23 Jun 2009 23:05:18 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 23 Jun 2009 23:05:18 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of ted.dunning@gmail.com designates 209.85.217.215 as permitted sender) Received: from [209.85.217.215] (HELO mail-gx0-f215.google.com) (209.85.217.215) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 23 Jun 2009 23:05:10 +0000 Received: by gxk11 with SMTP id 11so642126gxk.5 for ; Tue, 23 Jun 2009 16:04:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :from:date:message-id:subject:to:content-type; bh=Yg4dcjo3T1B2rI/8L9x4JjNVXvuIB9PJPLaVkA2SaXU=; b=keSpbDehO0zfbQ/b7FZMomprbH9+z+CAOiV5w8yAqh6BnnyrgelDzb7QJHKYfwo/t5 STBIhtpLom9joHuTQEXRtqRjgV5ONuRA80hTYp+kBg934MGGCYFip0By2v03Gkk+4x/F GLNx954zOMJ4/zrliawCKhh+EnNBBBNq4WmZE= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; b=DEXpSM2PmMVbJL+cfo1BUBdOc9eNlt2U3b4cr6/JvgFxnG7xXP5w0V7+w223a/KSbP d+MCC6TDMrjjX+9wytoybbkVBMnuq+4bbU0ktp3w1ayHBhL4A/TCNdn8MyXU/UqljauA UOnbUA4V+EJLX+e0lgtH9mt1OZ1p/2+rkthNs= MIME-Version: 1.0 Received: by 10.151.149.4 with SMTP id b4mr1267081ybo.0.1245798290061; Tue, 23 Jun 2009 16:04:50 -0700 (PDT) In-Reply-To: <23093.42755.qm@web24605.mail.ird.yahoo.com> References: <361287.67300.qm@web24613.mail.ird.yahoo.com> <286DF62E-3DCF-49AB-8AA1-9A0B29C9971A@apache.org> <448825.32301.qm@web24605.mail.ird.yahoo.com> <23093.42755.qm@web24605.mail.ird.yahoo.com> From: Ted Dunning Date: Tue, 23 Jun 2009 16:04:30 -0700 Message-ID: Subject: Re: mahout PLSI (with some lucene, thrown in) To: mahout-user@lucene.apache.org Content-Type: multipart/alternative; boundary=001517511ace8a7c1e046d0c04cf X-Virus-Checked: Checked by ClamAV on apache.org --001517511ace8a7c1e046d0c04cf Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit You can definitely use wordnet or any other taxonomical resource that you like. Unfortunately, these rarely actually help retrieval performance and occasionally severely damage retrieval (often due to excess UI clutter). On Tue, Jun 23, 2009 at 3:09 PM, Paul Jones wrote: > I am wondering if there is a easier way, i.e to use things like existing > hyponyms relations which exist (wordnet and the like) , and/or if they do > not then I guess using something similar to a "google distance measure" may > help in "adding" new words to the system.... > --001517511ace8a7c1e046d0c04cf--