lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Engels" <reng...@ix.netcom.com>
Subject RE: [PATCH] 1030 Phonetic Search Capability Implemented
Date Wed, 07 Jan 2004 20:27:34 GMT
One thing to be aware of, the distance returned might be "incorrect".

I could not find the documentation on exactly how that should be normalized,
and/or filtered, so the distances returned (if you do not set the
minimumDistance) vary from 1.0 to 2.0.

Robert

-----Original Message-----
From: Robert Engels [mailto:rengels@ix.netcom.com]
Sent: Wednesday, January 07, 2004 2:21 PM
To: Lucene-Dev
Subject: [PATCH] 1030 Phonetic Search Capability Implemented


I took the reference to Phonetix and went one better... the attached patch
allows for phonetic searching without adding new terms, fields, or
analyzers.

There is an interface 'PhoneticProvider' that IndexReader's can implement to
improve performance, otherwise it falls back to a linear search of terms -
similar to the way Fuzzy searches work.

An interesting point, is that the encoder is completely definable, so
'phonetic searching' does not necessarily have to relate to 'phonetics' at
all, but rather it can be viewed as 'alternate term' support, where a single
term, can have an alternate representation.

The expression language has been changed to allow terms ending with "$" to
be a phonetic search, so

+balloon$

would find all terms that sound like balloon.

This implementation will work with all existing index files, but if the
standard IndexReader/Writer were modified to store a 'encoding index' for
each term, it would be easy to implement PhoneticProvider, would would stop
the linear term search.

The posted patch contains code under the LGPL that came directly from the
phonetix library. This is my first patch, so I am not sure that is ok,
rather, I might have to post the entire library, and change the build to
link with it???

Let me know what you think.

Robert Engels






---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message