lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shashi Kant <>
Subject Re: Similarity
Date Tue, 23 Jun 2009 11:17:01 GMT

If you search the archives of this mailing-list, there have been
plenty of discussions in the past about LSI/LSA & Lucene.

On Tue, Jun 23, 2009 at 6:55 AM, Cool The
Breezer<> wrote:
> Shashi,
>          I think I am planning or intended to do the same thing as implemented in
LSI methodology. It seems from your meesage, you or somebody might have used the LSI approach
in lucene. So can you share some of your work. I am more interested to know any library or
package or paper used for analyzing terms semantically and constrcuting vector space.
> - RB
> ----- Original Message ----
> From: Shashi Kant <>
> To:
> Sent: Tuesday, June 23, 2009 3:20:16 PM
> Subject: Re: Similarity
> I suspect what you are looking for is "Latent Semantics" - it can
> algorithmically infer that "iPod~iPhone" or "Apple~Steve Jobs". Google for
> "Latent Semantic Indexing" or "Latent Semantic Analysis" - you can apply
> some of those approaches using the TermVectors in Lucene index.
> Ontologies such as WordNet are very generic, hence if you have a domain
> specific corpus, you would need to generate some kind of Latent Semantic
> Index to extract the relations therein.
> On Tue, Jun 23, 2009 at 5:27 AM, Cool The Breezer
> <>wrote:
>> Of the late I started using Lucene as main search library for all documents
>> in our intranet. It works extremely well. I am trying to use similarity
>> kinda functionality to find similarity between two sentences/documents and
>> trying to use Wordnet in our searching solution. I have used wordnet contrib
>> package and it really works well to expand queries with synonyms and get
>> results. But I can get handicap when searching for documents with query like
>> "Steve Jobs" and documents containing "apple" should be returned. In the
>> same way "pirated" and "willfull downloading copyrighted material". This
>> comes finding meaning of a word wrt its context. Has anybody done any kind
>> of such context based indexing that means while tokenization based on
>> context of each word/token and searching the same after expanding the query
>> using synonyms. I have come across some sf projects like
>>  to semantically relating words
>> using wordnet but I am
>>  still kinda confused on how to move ahead with such kind of context based
>> search. Appreciate your help. I understand that this might not be directly
>> related to Lucene but somehow this falls in the same domain search solution.
>> - RB
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail:
>> For additional commands, e-mail:
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message