lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Xiangyu Jin <x...@cs.virginia.edu>
Subject Re: similarity matrix - more clear
Date Tue, 30 Nov 2004 15:09:34 GMT


I also have the same task as you do. According to my understanding,
suppose their are N documents, your approach will take N^2 similarity
calculations.

Although there are N(N-1)/2 distinct document pairs,
the similarity calculation (according to my understanding) in Lucene is
asymmetric, so this means you have to calculate N(N-1) similaries.
Therefore, seems your approach is not so redundant since you have to
calculate O(N^2) order of similarities.

On Tue, 30 Nov 2004, Roxana Angheluta wrote:

> Dear all,
>
> Yesterday I've asked a question about geting the similarity matrix of a
> collection of documents from an index, but I got only one answer, so
> perhaps my question was not very clear.
>
> I will try to reformulate:
>
> I want to use Lucene to have efficient access to an index of a
> collection of documents. My final purpose is to cluster documents.
> Therefore I need to have for each pair of documents a number signifying
> the similarity between them.
> A possible solution would be to initialize in turn each document as a
> query, do a search using an IndexSearcher and to take from the search
> result the similarity between the query (which is in fact a document)
> and all the other documents. This is highly redundant, because the
> similarity between a pair of documents is computed multiple times.
>
> I was wondering whether there is a simpler way to do it, since the index
> file contains all the information needed. Can anyone help me here?
>
> Thanks,
> roxana
>
> PS I know about the project Carrot2, which deals with document
> clustering, but I think is not appropriate for me because of 2 reasons:
> 1) I need to keep the index on the disk for further reusage
> 2) I need to be able to search efficiently in the index
> I thought Lucene can help me here, am I wrong?
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message