lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edwin Mol <>
Subject International Stemmers and Character Encoding
Date Sat, 11 Jun 2005 08:39:23 GMT
I have downloaded the analysers sources from the sandbox area, but for 
every *Stemmer class I'm having compilation problems:
"Invalid Character Constant".
Here is how a code snipper looks like from the DutchtStemmer class:

   * Substitute ä, ë, ï, ö, ü, á , é, í, ó, ú
  private void substitute(StringBuffer buffer) {
    for (int i = 0; i < buffer.length(); i++) {
      switch (buffer.charAt(i)) {
        case 'ä':
        case 'á':
            buffer.setCharAt(i, 'a');
        case 'ë':
        case 'é'::

In this example the 'ä' Character causes a problem.

I think the code is messed up because of wrong character encoding of the 
java file.
Does anyone know if I'm correct and more importantly how to solve this 


Edwin Mol

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message