Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 76177 invoked from network); 23 Jun 2009 11:23:56 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 23 Jun 2009 11:23:56 -0000 Received: (qmail 40112 invoked by uid 500); 23 Jun 2009 11:24:05 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 40041 invoked by uid 500); 23 Jun 2009 11:24:04 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 40031 invoked by uid 99); 23 Jun 2009 11:24:04 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 23 Jun 2009 11:24:04 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of shashi.mit@gmail.com designates 74.125.46.30 as permitted sender) Received: from [74.125.46.30] (HELO yw-out-2324.google.com) (74.125.46.30) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 23 Jun 2009 11:23:56 +0000 Received: by yw-out-2324.google.com with SMTP id 2so1542982ywt.5 for ; Tue, 23 Jun 2009 04:23:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :from:date:message-id:subject:to:content-type :content-transfer-encoding; bh=TY9JFJx2mDQ8tMJSqcLCJDm9KGZgGlfVTdkEpxK0T64=; b=oXy4bOBzhszRCisLXTCotPUbs5EmgkcYful17CAIsPRsEFPYLCQVuVv6/Psn/Rmti6 TgWbjyPdFc5J4ygOaYGYJEfwQIsJE2RAgAfB5NzAKA13H8MjLMx92ILMVXxU4MdA8BC9 WmyWmGcpKsZbJaNmKT5AAPDic16u+gl+QNUEY= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type:content-transfer-encoding; b=Z7Z1kZh8zT4/GKxaUZ3W6TZX6mxTLD/JOIrVP4GyQ6K3DmdTj3QW0EMQAtzMnBiAQw wMrlGuerOEgWsrwm2AAYVoGKSNraaLF6V27M8OANHmrVb27bmTK38ZEoURN6X5lmGqt8 DTf5jo7OstHUs7DQtkHTbRdLqbz4gdTdy2Wkk= MIME-Version: 1.0 Received: by 10.151.130.11 with SMTP id h11mr13471275ybn.247.1245755841051; Tue, 23 Jun 2009 04:17:21 -0700 (PDT) In-Reply-To: <332894.91357.qm@web110312.mail.gq1.yahoo.com> References: <754804.63080.qm@web110301.mail.gq1.yahoo.com> <4d19a3630906230250t4febd49arffacb33e750995b2@mail.gmail.com> <332894.91357.qm@web110312.mail.gq1.yahoo.com> From: Shashi Kant Date: Tue, 23 Jun 2009 07:17:01 -0400 Message-ID: <4d19a3630906230417q5394db77u166a57a6f2733396@mail.gmail.com> Subject: Re: Similarity To: java-user@lucene.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org http://code.google.com/p/semanticvectors/ If you search the archives of this mailing-list, there have been plenty of discussions in the past about LSI/LSA & Lucene. On Tue, Jun 23, 2009 at 6:55 AM, Cool The Breezer wrote: > > Shashi, > =A0 =A0 =A0 =A0 =A0I think I am planning or intended to do the same thing= as implemented in LSI methodology. It seems from your meesage, you or some= body might have used the LSI approach in lucene. So can you share some of y= our work. I am more interested to know any library or package or paper used= for analyzing terms semantically and constrcuting vector space. > > - RB > > > ----- Original Message ---- > From: Shashi Kant > To: java-user@lucene.apache.org > Sent: Tuesday, June 23, 2009 3:20:16 PM > Subject: Re: Similarity > > I suspect what you are looking for is "Latent Semantics" - it can > algorithmically infer that "iPod~iPhone" or "Apple~Steve Jobs". Google fo= r > "Latent Semantic Indexing" or "Latent Semantic Analysis" - you can apply > some of those approaches using the TermVectors in Lucene index. > Ontologies such as WordNet are very generic, hence if you have a domain > specific corpus, you would need to generate some kind of Latent Semantic > Index to extract the relations therein. > > > > > On Tue, Jun 23, 2009 at 5:27 AM, Cool The Breezer > wrote: > >> >> Of the late I started using Lucene as main search library for all docume= nts >> in our intranet. It works extremely well. I am trying to use similarity >> kinda functionality to find similarity between two sentences/documents a= nd >> trying to use Wordnet in our searching solution. I have used wordnet con= trib >> package and it really works well to expand queries with synonyms and get >> results. But I can get handicap when searching for documents with query = like >> "Steve Jobs" and documents containing "apple" should be returned. In the >> same way "pirated" and "willfull downloading copyrighted material". This >> comes finding meaning of a word wrt its context. Has anybody done any ki= nd >> of such context based indexing that means while tokenization based on >> context of each word/token and searching the same after expanding the qu= ery >> using synonyms. I have come across some sf projects like >> http://wn-similarity.sourceforge.net/ =A0to semantically relating words >> using wordnet but I am >> =A0still kinda confused on how to move ahead with such kind of context b= ased >> search. Appreciate your help. I understand that this might not be direct= ly >> related to Lucene but somehow this falls in the same domain search solut= ion. >> >> - RB >> >> >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >> For additional commands, e-mail: java-user-help@lucene.apache.org >> >> > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org