Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: java-user@lucene.apache.org
Received-SPF: pass (athena.apache.org: domain of shashi.mit@gmail.com
 designates 74.125.46.30 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=mime-version:in-reply-to:references:from:date:message-id:subject:to
         :content-type:content-transfer-encoding;
        b=Z7Z1kZh8zT4/GKxaUZ3W6TZX6mxTLD/JOIrVP4GyQ6K3DmdTj3QW0EMQAtzMnBiAQw
         wMrlGuerOEgWsrwm2AAYVoGKSNraaLF6V27M8OANHmrVb27bmTK38ZEoURN6X5lmGqt8
         DTf5jo7OstHUs7DQtkHTbRdLqbz4gdTdy2Wkk=
MIME-Version: 1.0
In-Reply-To: <332894.91357.qm@web110312.mail.gq1.yahoo.com>
References: <754804.63080.qm@web110301.mail.gq1.yahoo.com>
	<4d19a3630906230250t4febd49arffacb33e750995b2@mail.gmail.com>
	<332894.91357.qm@web110312.mail.gq1.yahoo.com>
From: Shashi Kant <shashi.mit@gmail.com>
Date: Tue, 23 Jun 2009 07:17:01 -0400
Message-ID: <4d19a3630906230417q5394db77u166a57a6f2733396@mail.gmail.com>
Subject: Re: Similarity
To: java-user@lucene.apache.org
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

http://code.google.com/p/semanticvectors/

If you search the archives of this mailing-list, there have been
plenty of discussions in the past about LSI/LSA & Lucene.


On Tue, Jun 23, 2009 at 6:55 AM, Cool The
Breezer<techcool.kumar@yahoo.com> wrote:
>
> Shashi,
> =A0 =A0 =A0 =A0 =A0I think I am planning or intended to do the same thing=
 as implemented in LSI methodology. It seems from your meesage, you or some=
body might have used the LSI approach in lucene. So can you share some of y=
our work. I am more interested to know any library or package or paper used=
 for analyzing terms semantically and constrcuting vector space.
>
> - RB
>
>
> ----- Original Message ----
> From: Shashi Kant <shashi.mit@gmail.com>
> To: java-user@lucene.apache.org
> Sent: Tuesday, June 23, 2009 3:20:16 PM
> Subject: Re: Similarity
>
> I suspect what you are looking for is "Latent Semantics" - it can
> algorithmically infer that "iPod~iPhone" or "Apple~Steve Jobs". Google fo=
r
> "Latent Semantic Indexing" or "Latent Semantic Analysis" - you can apply
> some of those approaches using the TermVectors in Lucene index.
> Ontologies such as WordNet are very generic, hence if you have a domain
> specific corpus, you would need to generate some kind of Latent Semantic
> Index to extract the relations therein.
>
>
>
>
> On Tue, Jun 23, 2009 at 5:27 AM, Cool The Breezer
> <techcool.kumar@yahoo.com>wrote:
>
>>
>> Of the late I started using Lucene as main search library for all docume=
nts
>> in our intranet. It works extremely well. I am trying to use similarity
>> kinda functionality to find similarity between two sentences/documents a=
nd
>> trying to use Wordnet in our searching solution. I have used wordnet con=
trib
>> package and it really works well to expand queries with synonyms and get
>> results. But I can get handicap when searching for documents with query =
like
>> "Steve Jobs" and documents containing "apple" should be returned. In the
>> same way "pirated" and "willfull downloading copyrighted material". This
>> comes finding meaning of a word wrt its context. Has anybody done any ki=
nd
>> of such context based indexing that means while tokenization based on
>> context of each word/token and searching the same after expanding the qu=
ery
>> using synonyms. I have come across some sf projects like
>> http://wn-similarity.sourceforge.net/ =A0to semantically relating words
>> using wordnet but I am
>> =A0still kinda confused on how to move ahead with such kind of context b=
ased
>> search. Appreciate your help. I understand that this might not be direct=
ly
>> related to Lucene but somehow this falls in the same domain search solut=
ion.
>>
>> - RB
>>
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org