lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rodrigo Reyes" <>
Subject Re: Normalization
Date Wed, 13 Mar 2002 19:01:46 GMT
Hi Alex,

> Would it make sense to allow a full regex in the matching part? Could
> use regex or oromatcher packages. Don't know how that would affect your
> hashing though...

 I'd give an answer not really different than Brian's : you don't really
need all that power. Although I don't have significant experience with
non-european languages, this is not the first tool of the kind I write, and
to my knowledge you don't really need more power than that. At least, not
the kind of additional expressiveness that can be provided by regexps
(although, as I mentionned in another mail, you may need restriction on the
size of the string input or output, for example soundex specifies a 4-letter
limitation that is not currently addressed by the language).

However, I'd be very interested in hearing about counter-example that would
need. The only counter-example I could find was the annoyance of having to
remove sequences of the same letter, which was unnice, so I added an option
called "uniquify" to do the job more easely (as you can see in the soundex
or french normalizer).


To unsubscribe, e-mail:   <>
For additional commands, e-mail: <>

View raw message