lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Spencer <>
Subject DoubleMetaphoneQuery
Date Fri, 19 Dec 2003 19:51:44 GMT

I've seen discussions about using the double metaphone algorithm with 
Lucene (basically: like soundex, used
to find works that sound similar in English at least) but couldn't find 
an implementation, so I spent
a few minutes and wrote a Query and TermEnum object for this. I may have 
missed the prior art so sorry if I did...

[1] Here are some mail msgs that mention double metaphone wrt Lucene:

[2] And Phoenix has a double metaphone  Analyzer, but not a Query, which 
I guess is another angle on things:

[3] Attached are 2 files (DoubleMetaPhoneQuery and 
DoubleMetaphoneTermEnum) that I think are valid contributions
to the Lucene Sandbox. Hopefully all that has to be done is change the 
package line if the powers that be accept this.

Note: My impl uses the Jakarta CODEC package ( ) for the double metaphone 
algorithm implementation.

Also, any query expansion such as this could exceed the bounds of a 
boolean query, thus BooleanQuery.setMaxClauseCount
may need to be used to avoid an exception.

[4] I've updated my Lucene demo site which has the ~3500 RFCs indexed 
and searchable by Lucene. I added an "advanced query"
page to try out the DoubleMetaphoneQuery:

It's a few lines down at this URL:

[5] Most of the above is redundantly stated here as a kind of perma-link:


While it's easy to write additonal Query classes, I suspect they are a 
kind of dead end and won't really be
used unless they are integrated into the QueryParser - thus one concept 
is that the Lucene syntax should
have some extension mechanism so you can pass a query like 
"metaphone::protokal" to it and "metaphone::"
(note the double colons)  would mean to use DoubleMetaphoneQuery for 
this term. Maybe an extensible query parser
should be the subject of another email?

So: let me know if this is useful and plz enter it into the sandbox...

 Dave Spencer

View raw message