lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mark Tucker" <MTuc...@infoimage.com>
Subject RE: Normalization
Date Mon, 11 Mar 2002 21:28:58 GMT
You can't learn, if you don't ask the question.  Thanks for your response.

Mark

-----Original Message-----
From: Rodrigo Reyes [mailto:reyes@charabia.net]
Sent: Monday, March 11, 2002 2:26 PM
To: Lucene Developers List
Subject: Re: Normalization



 Well, choosing XML for such a description language has the following
drawbacks:

 * hardly legible. Having one rule per line is really nice. I appreciated it
writing the french normalizer.

 * it does not solve all the parsing problems.
    - either you have to specify everything as elements or attributes, and
it's painful :
         <leftContext><range value="aeiou"/>er<range="tr"/></leftContext>
         <rightContext>er<boundary/></leftContext>
    - either you have a write a parser anyway to parse the content of the
elements:

         <rightContext>[aeiou]r$</leftContext>
        and therefore write a parse for the content of the xml-parsed
content.


 Rodrigo
----- Original Message -----
From: "Mark Tucker" <MTucker@infoimage.com>
To: "Lucene Developers List" <lucene-dev@jakarta.apache.org>
Sent: Monday, March 11, 2002 10:10 PM
Subject: RE: Normalization


> Why not use XML?
>
> <normalizer>
> <rule>
> <leftContext></leftContext>
> <rightContext></rightContext>
> <transformLetters></transformLetters>
> <replacementString></replacementString>
> </rule>
> <rule>
> <leftContext></leftContext>
> <rightContext></rightContext>
> <transformLetters></transformLetters>
> <replacementString></replacementString>
> </rule>
> </normalizer>
>
>
> There are some issues with the characters you use, but using XML might
make it easier to extend.
>
> Mark



--
To unsubscribe, e-mail:   <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>


--
To unsubscribe, e-mail:   <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>


Mime
View raw message