lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Engels" <reng...@ix.netcom.com>
Subject [PATCH] 1030 Phonetic Search Capability Implemented
Date Wed, 07 Jan 2004 20:20:51 GMT
I took the reference to Phonetix and went one better... the attached patch
allows for phonetic searching without adding new terms, fields, or
analyzers.

There is an interface 'PhoneticProvider' that IndexReader's can implement to
improve performance, otherwise it falls back to a linear search of terms -
similar to the way Fuzzy searches work.

An interesting point, is that the encoder is completely definable, so
'phonetic searching' does not necessarily have to relate to 'phonetics' at
all, but rather it can be viewed as 'alternate term' support, where a single
term, can have an alternate representation.

The expression language has been changed to allow terms ending with "$" to
be a phonetic search, so

+balloon$

would find all terms that sound like balloon.

This implementation will work with all existing index files, but if the
standard IndexReader/Writer were modified to store a 'encoding index' for
each term, it would be easy to implement PhoneticProvider, would would stop
the linear term search.

The posted patch contains code under the LGPL that came directly from the
phonetix library. This is my first patch, so I am not sure that is ok,
rather, I might have to post the entire library, and change the build to
link with it???

Let me know what you think.

Robert Engels





Mime
View raw message