lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Cam Bazz" <>
Subject string similarity measures
Date Thu, 04 Sep 2008 12:38:34 GMT
This came up before but - if we were to make a swear word filter, string
edit distances are no good. for example words like `shot` is confused with
`shit`. there is also problem with words like hitchcock. appearently i need
something like soundex or double metaphone. the thing is - these are
language specific, and i am not operating in english.

I need a fuzzy like curse word filter for turkish, simply.

Best regards,

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message