lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Cam Bazz" <camb...@gmail.com>
Subject string similarity measures
Date Thu, 04 Sep 2008 12:38:34 GMT
Hello,
This came up before but - if we were to make a swear word filter, string
edit distances are no good. for example words like `shot` is confused with
`shit`. there is also problem with words like hitchcock. appearently i need
something like soundex or double metaphone. the thing is - these are
language specific, and i am not operating in english.

I need a fuzzy like curse word filter for turkish, simply.

Best regards,
-C.B.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message