lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <>
Subject Re: Lucene standard analyzer internationalization
Date Tue, 22 Apr 2008 21:03:48 GMT

: Yes the version of lucene and java are exactly the same on the different
: machines.
: Infact we unjared lucene and jared it with our jar and are running from the
: same nfs mounts on both the machines

i didn't do an indepth code read, but a quick skim of 
StandardTokenizerImpl didn't turn up any questionale uses of APIs that 
might have differnet behavior depending on the default locale/charset of 
the JVM running it ... everthing is simple char or String based access.

Are you *certain* that you are providing Lucene with the Strings you think 
you are?  Is it possible that you are using a FileReader or 
InputStreamReader that rely on the default character encoding of the JVM 
(which may not be correct for the data you are reading in) ?

Can you write a simple junit test that fails on one machine and passes on 
the other?  If so i'd love to see that test along with the output of this 

  java.util.Enumeration e = System.getProperties().propertyNames();
  while(e.hasMoreElements()) {
    String prop = (String)e.nextElement();
    System.out.println(prop + " = " +,


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message