Return-Path: Delivered-To: apmail-lucene-lucy-dev-archive@minotaur.apache.org Received: (qmail 1589 invoked from network); 17 Mar 2010 04:00:47 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 17 Mar 2010 04:00:47 -0000 Received: (qmail 41209 invoked by uid 500); 17 Mar 2010 04:00:47 -0000 Delivered-To: apmail-lucene-lucy-dev-archive@lucene.apache.org Received: (qmail 41148 invoked by uid 500); 17 Mar 2010 04:00:46 -0000 Mailing-List: contact lucy-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: lucy-dev@lucene.apache.org Delivered-To: mailing list lucy-dev@lucene.apache.org Received: (qmail 41138 invoked by uid 99); 17 Mar 2010 04:00:46 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 17 Mar 2010 04:00:46 +0000 X-ASF-Spam-Status: No, hits=-1.1 required=10.0 tests=AWL,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [209.98.116.241] (HELO pekmac.local) (209.98.116.241) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 17 Mar 2010 04:00:41 +0000 Received: from pekmac.local (localhost [127.0.0.1]) by pekmac.local (Postfix) with ESMTP id 0A30F16E497 for ; Tue, 16 Mar 2010 23:00:21 -0500 (CDT) Message-ID: <4BA053D4.3050706@peknet.com> Date: Tue, 16 Mar 2010 23:00:20 -0500 From: Peter Karman User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.1.8) Gecko/20100227 Thunderbird/3.0.3 MIME-Version: 1.0 To: lucy-dev@lucene.apache.org Subject: Re: [Lucy] Re: MoreLikeThisQuery References: <20100316051735.GB27885@rectangular.com> <8f0ad1f31003160601x5cd7645fi20b072d13cbca32e@mail.gmail.com> <20100316140254.GA29108@rectangular.com> In-Reply-To: <20100316140254.GA29108@rectangular.com> X-Enigmail-Version: 1.0.1 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Marvin Humphrey wrote on 3/16/10 9:02 AM: > His suggestion was to use OpenCyc to classify terms. > > That's similar to what we'd do with topic vectors generated by an indexing > component, except that the Cyc topic vectors were built laboriously by hand > rather than using automatic dimension reduction. I've been looking at the SenseClusters package. It's very unfriendly to use, but the ideas in it are worth some investigation. http://www.d.umn.edu/~tpederse/senseclusters.html It uses SVDPACKC to do the big matrix math: http://netlib.org/svdpack/ -- Peter Karman . http://peknet.com/ . peter@peknet.com