lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Beto Siless <b...@tera-code.com.ar>
Subject Re: near duplicates
Date Tue, 24 Oct 2006 14:08:15 GMT

Hi Karl!
I'm interested in near duplicate detection based on termFreqVectos. Now 
I'm comparing all documents with each other (calculating the angle)... 
Is there a way to avoid that?

Thanks!
Beto

karl wettin wrote:
> 
> 17 okt 2006 kl. 17.54 skrev Find Me:
> 
>> How to eliminate near duplicates from the index?
> 
> Oh, one more thing. You should probably look at the norms in order to 
> avoid comparing all documents to each other.
> 
> 
> 
> ------------------------------------------------------------------------
> 
> No virus found in this incoming message.
> Checked by AVG Free Edition.
> Version: 7.1.408 / Virus Database: 268.13.4/477 - Release Date: 10/16/2006

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message