lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cool The Breezer <>
Subject Re: Similarity
Date Tue, 23 Jun 2009 10:55:57 GMT

          I think I am planning or intended to do the same thing as implemented in LSI methodology.
It seems from your meesage, you or somebody might have used the LSI approach in lucene. So
can you share some of your work. I am more interested to know any library or package or paper
used for analyzing terms semantically and constrcuting vector space.

- RB

----- Original Message ----
From: Shashi Kant <>
Sent: Tuesday, June 23, 2009 3:20:16 PM
Subject: Re: Similarity

I suspect what you are looking for is "Latent Semantics" - it can
algorithmically infer that "iPod~iPhone" or "Apple~Steve Jobs". Google for
"Latent Semantic Indexing" or "Latent Semantic Analysis" - you can apply
some of those approaches using the TermVectors in Lucene index.
Ontologies such as WordNet are very generic, hence if you have a domain
specific corpus, you would need to generate some kind of Latent Semantic
Index to extract the relations therein.

On Tue, Jun 23, 2009 at 5:27 AM, Cool The Breezer

> Of the late I started using Lucene as main search library for all documents
> in our intranet. It works extremely well. I am trying to use similarity
> kinda functionality to find similarity between two sentences/documents and
> trying to use Wordnet in our searching solution. I have used wordnet contrib
> package and it really works well to expand queries with synonyms and get
> results. But I can get handicap when searching for documents with query like
> "Steve Jobs" and documents containing "apple" should be returned. In the
> same way "pirated" and "willfull downloading copyrighted material". This
> comes finding meaning of a word wrt its context. Has anybody done any kind
> of such context based indexing that means while tokenization based on
> context of each word/token and searching the same after expanding the query
> using synonyms. I have come across some sf projects like
>  to semantically relating words
> using wordnet but I am
>  still kinda confused on how to move ahead with such kind of context based
> search. Appreciate your help. I understand that this might not be directly
> related to Lucene but somehow this falls in the same domain search solution.
> - RB
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message