lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jayakumar.V" <>
Subject Issue with sounds-like queries
Date Tue, 27 Sep 2005 13:53:54 GMT


I'm facing an issue with sounds-like queries. I've experimented with both
Apache Codec & the Phonetix library from Tangentum Technologies
( ) to see if I
could sort out the issue somehow using either of the libraries. 


I've an index containing details of various Banks in the world & their
associated Branches. 

Each document has a field holding the Branch Name(s) for the individual


While searching for the following branch name :-  QUILON, it also returns
back details where the branch name may contain the word COLONY, since using
Metaphone or DoubleMetaphone, both QUILON & COLONY get encoded to the same
value :-  KLN. 

This returns in-correct results. 


Another example would be CALICUT (located in South India) & CALCUTTA
(located in North India), both get encoded to KLKT.


I can narrow down the result by filtering based on COUNTRY or COUNTRY +
STATE but still I might get back results which may not be the one intended.


I also tried using the RefinedSoundex class. The issue here is that, "QUILON
BRANCH" will get encoded as - Q50708190830, whereas "QUILON" alone will get
encoded as - Q50708. The user may input only "QUILON" while making a search
which will not return back hits in the above case.


Hope I was clear in communicating the issue. 


Any thoughts / inputs will be really helpful.



Thanks & Regards



UAE Xchange Center

PB.No. : 170, Abudhabi, UAE

Phone: + 971-2-6105656, 6105658

Fax: +971-2-6323775


Confidentiality Notice :  This e-mail message, including any attachments, is
for the sole use of the intended recipient(s) and may contain confidential
and privileged information. Any unauthorized review, use, disclosure or
distribution is prohibited. If you are not the intended recipient, please
contact the sender by reply e-mail and destroy all copies of the original



  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message