Return-Path: Delivered-To: apmail-lucene-mahout-user-archive@minotaur.apache.org Received: (qmail 99288 invoked from network); 23 Jun 2009 23:56:56 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 23 Jun 2009 23:56:56 -0000 Received: (qmail 95975 invoked by uid 500); 23 Jun 2009 23:57:07 -0000 Delivered-To: apmail-lucene-mahout-user-archive@lucene.apache.org Received: (qmail 95927 invoked by uid 500); 23 Jun 2009 23:57:06 -0000 Mailing-List: contact mahout-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mahout-user@lucene.apache.org Delivered-To: mailing list mahout-user@lucene.apache.org Received: (qmail 95917 invoked by uid 99); 23 Jun 2009 23:57:06 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 23 Jun 2009 23:57:06 +0000 X-ASF-Spam-Status: No, hits=3.4 required=10.0 tests=HTML_MESSAGE,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [212.82.104.163] (HELO web24606.mail.ird.yahoo.com) (212.82.104.163) by apache.org (qpsmtpd/0.29) with SMTP; Tue, 23 Jun 2009 23:56:55 +0000 Received: (qmail 22255 invoked by uid 60001); 23 Jun 2009 23:56:32 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.co.uk; s=s1024; t=1245801392; bh=0gSM3czzjbh4ijwEM3WmfQPM/Rm9leQqDQNHPUtaz/M=; h=Message-ID:X-YMail-OSG:Received:X-Mailer:References:Date:From:Subject:To:In-Reply-To:MIME-Version:Content-Type; b=vFyGOWZVTDU+ECo0blskW1T/v/Ch5gMVpPYDuaeQfoGfQDymV1p/0+wD/3vJ7Mq4s9iuoYRd3yWVM9m8URRy1UOvTxvRzA/VEXuWqXJYRJJKhT0o4CIXkU1sUwPOHwMsbXfDyZ+Twk9rAUtYhf73AY1eYtxBH3uo/8GdKzWnyrU= DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.co.uk; h=Message-ID:X-YMail-OSG:Received:X-Mailer:References:Date:From:Subject:To:In-Reply-To:MIME-Version:Content-Type; b=FSjd8Oja6WXXEd9BetCa0YvjHQS1Y0XHHRcjAmvFPCu0RfuiQtyBZ+VvHwEnU6vxqc72oaqmFjt+zSp/wuzaIaX13qn97mxeTJ1nSHzuOyZ1UNv6N+jIpGkoQw5GnQKvxhAue7Lq9pLETc7+d9AqAqnKsrbBc++k75iUvIyap20=; Message-ID: <379622.20816.qm@web24606.mail.ird.yahoo.com> X-YMail-OSG: _MVUu88VM1m.lOr1e56RuJrzbYnIB2RKqR6SUdgh7lqt_PLKkPHLT2CbJb_zs8o49f98TWuR7_aAU6YYyDN9Be3uyNQgdDTFCPpNkEgkkcdK6NxJPOjzXCE3_BBD.5Mv3K4kujNE5EK2fKRvsrUkFehA3B63awsd4HkQ56pFkSWwLMxqH27VKsAEtoszyjBi.dsJ3XmJ6u28V4MOlzWlkoqgIyXbSeeSa85F4t2E5UCMWVgJpnDMSMTHfCvIoIuk1hCXCPV42CGa4tAOEjhH7gG0KT3x82wvrgiKhbH_ct3YucBve0abuCIZJA-- Received: from [79.76.203.213] by web24606.mail.ird.yahoo.com via HTTP; Tue, 23 Jun 2009 16:56:32 PDT X-Mailer: YahooMailRC/1277.43 YahooMailWebService/0.7.289.15 References: <361287.67300.qm@web24613.mail.ird.yahoo.com> <286DF62E-3DCF-49AB-8AA1-9A0B29C9971A@apache.org> <448825.32301.qm@web24605.mail.ird.yahoo.com> <23093.42755.qm@web24605.mail.ird.yahoo.com> <770A3644-4DFB-4863-9E09-692276C0498C@gmail.com> Date: Tue, 23 Jun 2009 16:56:32 -0700 (PDT) From: Paul Jones Subject: Re: mahout PLSI (with some lucene, thrown in) To: mahout-user@lucene.apache.org In-Reply-To: <770A3644-4DFB-4863-9E09-692276C0498C@gmail.com> MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="0-132036741-1245801392=:20816" X-Virus-Checked: Checked by ClamAV on apache.org --0-132036741-1245801392=:20816 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Yup, I see that wordnet has also been "ported" to a lucene index, and hence= pulling the hyponyms works great.=0A=0Atks=0A=0APaul=0A=0A=0A=0A=0A_______= _________________________=0AFrom: Tommy Chheng =0ATo: = mahout-user@lucene.apache.org=0ASent: Tuesday, 23 June, 2009 23:19:25=0ASub= ject: Re: mahout PLSI (with some lucene, thrown in)=0A=0AHave you looked at= WordNet to get the hypohyms?=0A=0ATommy=0A=0AOn Jun 23, 2009, at 3:09 PM, = Paul Jones wrote:=0A=0A> Okay, have seen the difficulty (apart from the mat= hs :-)).=0A> =0A> I guess "similar" can mean many things, i.e hypohyms, but= also words such as hot...cold are also "related", hence to solve my little= problem I am wondering if there is a easier way, i.e to use things like ex= isting hyponyms relations which exist (wordnet and the like) , and/or if th= ey do not then I guess using something similar to a "google distance measur= e" may help in "adding" new words to the system....=0A> =0A> Paul=0A> =0A> = =0A> =0A> =0A> ________________________________=0A> From: Ted Dunning =0A> To: mahout-user@lucene.apache.org=0A> Sent: Tuesday,= 23 June, 2009 18:00:12=0A> Subject: Re: mahout PLSI (with some lucene, thr= own in)=0A> =0A> Yes. This can be done. It isn't necessarily real simple = to do.=0A> =0A> See http://citeseerx.ist.psu.edu/viewdoc/summary?doi=3D10.1= .1.56.7275 for an=0A> old (but still pretty good) example.=0A> =0A> On Tue,= Jun 23, 2009 at 6:45 AM, Paul Jones wrote:=0A> = =0A>> Imagine we have crawled 100K webpages, and we have 100 pages which sh= ow=0A>> "red" and 100 which show "crimson" and then 100 which show both "re= d and=0A>> crimson" is there a way to deduce that there maybe (albeit weak)= =0A>> relationship between red AND crimson. Of course we can pre-seed this = info,=0A>> which then gets weighted by actual results.=0A>> =0A> =0A> =0A> = =0A=0A=0A --0-132036741-1245801392=:20816--