Return-Path: Delivered-To: apmail-lucene-mahout-user-archive@minotaur.apache.org Received: (qmail 74585 invoked from network); 23 Jun 2009 22:09:35 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 23 Jun 2009 22:09:35 -0000 Received: (qmail 8482 invoked by uid 500); 23 Jun 2009 22:09:46 -0000 Delivered-To: apmail-lucene-mahout-user-archive@lucene.apache.org Received: (qmail 8450 invoked by uid 500); 23 Jun 2009 22:09:46 -0000 Mailing-List: contact mahout-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mahout-user@lucene.apache.org Delivered-To: mailing list mahout-user@lucene.apache.org Received: (qmail 8440 invoked by uid 99); 23 Jun 2009 22:09:46 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 23 Jun 2009 22:09:46 +0000 X-ASF-Spam-Status: No, hits=3.4 required=10.0 tests=HTML_MESSAGE,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [212.82.104.162] (HELO web24605.mail.ird.yahoo.com) (212.82.104.162) by apache.org (qpsmtpd/0.29) with SMTP; Tue, 23 Jun 2009 22:09:34 +0000 Received: (qmail 42820 invoked by uid 60001); 23 Jun 2009 22:09:12 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.co.uk; s=s1024; t=1245794952; bh=0jin+yjgBlRc/1018oT54YqvnJbjMGeFQuxi0oWqMSc=; h=Message-ID:X-YMail-OSG:Received:X-Mailer:References:Date:From:Subject:To:In-Reply-To:MIME-Version:Content-Type; b=yOYuxAToDChJvFD+e5OBZlH9FOm+5CwSCMA8ZksErcI3evQ95zSgYXVUb623hxXqZbGkiM+E/qkaldKu1FJKGdpalAM8GpqGO7S8laC4t0i8A+jc/c+eosx3PpS7+ByQnqqApUKFvFapHudLKsdCfKtuliSqoYtabe1U2YWN9Kg= DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.co.uk; h=Message-ID:X-YMail-OSG:Received:X-Mailer:References:Date:From:Subject:To:In-Reply-To:MIME-Version:Content-Type; b=2DuDqNDOfyI7LYDKuTqtDHF2DYtPPGQuZ/OgqsWB/SS2Wkk0i1wv+7NeK1K1LZOeWI5QN2BBePQDA5x1X4d232Kt5+KKaIh5feG2iD8dOPwj+V2NzdpYsC+pRj4ayXZ2mhZdhroWGTg+GwaHJLM7Ws9sexs5ygBdCAZMwD0ZPwk=; Message-ID: <23093.42755.qm@web24605.mail.ird.yahoo.com> X-YMail-OSG: kgOnwoYVM1lnmOgDIkvBKvmEasXXWOAB49UtdvXvIVUaibM9ZCrIFiXejNrnMJS8tKRRTQ0cPrLqZ.kpXVv.vq7CrNTw0KSkBTV9fjN.jMKq14B3ZVfEqQATL.ohUm1aK_bXE4z1Okb03j6V7qCWRvzr7pWcBYig07Ih1u8g14YlmOhhhZ1GU2C47Fciuj.iH80dEZC52.v2KOFKZZUppBlLC6IGiko3BYZJYgwH6N79.o3jLL3Gxk4JY.aecJr0176K4gHPOllAOXmSsQbKTHyVSQ.lS1V1eV0vBCHOk7t4_.QdtABgyA-- Received: from [79.76.203.213] by web24605.mail.ird.yahoo.com via HTTP; Tue, 23 Jun 2009 15:09:11 PDT X-Mailer: YahooMailRC/1277.43 YahooMailWebService/0.7.289.15 References: <361287.67300.qm@web24613.mail.ird.yahoo.com> <286DF62E-3DCF-49AB-8AA1-9A0B29C9971A@apache.org> <448825.32301.qm@web24605.mail.ird.yahoo.com> Date: Tue, 23 Jun 2009 15:09:11 -0700 (PDT) From: Paul Jones Subject: Re: mahout PLSI (with some lucene, thrown in) To: mahout-user@lucene.apache.org In-Reply-To: MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="0-1867606022-1245794951=:42755" X-Virus-Checked: Checked by ClamAV on apache.org --0-1867606022-1245794951=:42755 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Okay, have seen the difficulty (apart from the maths :-)). =0A=0AI guess "s= imilar" can mean many things, i.e hypohyms, but also words such as hot...co= ld are also "related", hence to solve my little problem I am wondering if t= here is a easier way, i.e to use things like existing hyponyms relations wh= ich exist (wordnet and the like) , and/or if they do not then I guess using= something similar to a "google distance measure" may help in "adding" new = words to the system....=0A=0APaul=0A=0A=0A=0A=0A___________________________= _____=0AFrom: Ted Dunning =0ATo: mahout-user@lucene.= apache.org=0ASent: Tuesday, 23 June, 2009 18:00:12=0ASubject: Re: mahout PL= SI (with some lucene, thrown in)=0A=0AYes. This can be done. It isn't nec= essarily real simple to do.=0A=0ASee http://citeseerx.ist.psu.edu/viewdoc/s= ummary?doi=3D10.1.1.56.7275 for an=0Aold (but still pretty good) example.= =0A=0AOn Tue, Jun 23, 2009 at 6:45 AM, Paul Jones wrote:=0A=0A> Imagine we have crawled 100K webpages, and we have 100 pages= which show=0A> "red" and 100 which show "crimson" and then 100 which show = both "red and=0A> crimson" is there a way to deduce that there maybe (albei= t weak)=0A> relationship between red AND crimson. Of course we can pre-seed= this info,=0A> which then gets weighted by actual results.=0A>=0A=0A=0A=0A= --0-1867606022-1245794951=:42755--