lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From bugzi...@apache.org
Subject DO NOT REPLY [Bug 10340] - [PATCH] Phonetic Search capability
Date Tue, 20 Jan 2004 00:33:23 GMT
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG 
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://nagoya.apache.org/bugzilla/show_bug.cgi?id=10340>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND 
INSERTED IN THE BUG DATABASE.

http://nagoya.apache.org/bugzilla/show_bug.cgi?id=10340

[PATCH] Phonetic Search capability





------- Additional Comments From otis@apache.org  2004-01-20 00:33 -------
Copy/paste from Robert's original email follows.

I took the reference to Phonetix and went one better... the attached 
patch
allows for phonetic searching without adding new terms, fields, or
analyzers.

There is an interface 'PhoneticProvider' that IndexReader's can 
implement to
improve performance, otherwise it falls back to a linear search of 
terms -
similar to the way Fuzzy searches work.

An interesting point, is that the encoder is completely definable, so
'phonetic searching' does not necessarily have to relate to 'phonetics' 
at
all, but rather it can be viewed as 'alternate term' support, where a 
single
term, can have an alternate representation.

The expression language has been changed to allow terms ending with "$" 
to
be a phonetic search, so

+balloon$

would find all terms that sound like balloon.

This implementation will work with all existing index files, but if the
standard IndexReader/Writer were modified to store a 'encoding index' 
for
each term, it would be easy to implement PhoneticProvider, would would 
stop
the linear term search.

The posted patch contains code under the LGPL that came directly from 
the
phonetix library. This is my first patch, so I am not sure that is ok,
rather, I might have to post the entire library, and change the build 
to
link with it???

Let me know what you think.

Robert Engels

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message