lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dawid Weiss <>
Subject Re: PhraseQuery and edit distance slightly confusing.
Date Wed, 15 Mar 2006 20:10:30 GMT

Hi Doug,

> Yes, it should probably be called "edit-distance-like" or something.

It should definitely say so in the JavaDoc because I've seen this 
propagate to people's articles (it was Eric Hatcher's I think, but I'm 
not sure).

> But what then would the criteria for matching at all be?  Right now it 
> is "distance <= slop", but, with this change, shouldn't it also take 
> into account the match length?

To be honest I didn't go that deep in the subject. I just found myself 
perplexed a bit when I saw you calculating the edit distance as a 
difference of two values :) Nothing personal of course, your code in 
there is quite neat but it tosses all all kinds of indices around and I 
actually thought I got it wrong somehow so I asked to be sure.

>  Right now the penalty is like the maximum error, but the sum of errors 
> in the match might be better.

That's the idea I came up too -- aggregating the error over absolute 
values of positional differences (?) would solve the problem; I'd also 
go for incrementing/ decrementing totalError instead of changing method 
signatures. I'm just not sure if it makes sense to change the behavior 
of PhraseQuery just because I had to please my curiosity :). People are 
used to how it works and maybe it should stay this way. If somebody 
really wants a real edit distance then with your hints and understanding 
of how PhraseQuery works it should be quite straightforward to implement.

Thanks for clarifications,

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message