lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron Schon <aaron_sc...@yahoo.com>
Subject Extremely Large Strings Comparison (slightly off-topic)
Date Fri, 14 Nov 2008 21:59:57 GMT
hi I need to compare two Base64 representation strings of some MIME content that I am storing
within a Lucene index. I need to efficiently compare them to find the closest match to a query
Base64 string , post Lucene query.

I am not sure of the best way to approach this, could I compare the hashes and compute their
similarity? Levenshtein distance seems hard because of the size of ths strings and seems inefficient?
Is there any other method you could suggest?

n.b: The idea is to not to determine exact match or not, it is to compute a similarity metric.
for example

John & Johnson (closer)
vs,
John & Jimmy (farther)

tia,
Aaron


      

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message