lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <hossman_luc...@fucit.org>
Subject Re: Lucene standard analyzer internationalization
Date Tue, 22 Apr 2008 21:03:48 GMT

: Yes the version of lucene and java are exactly the same on the different
: machines.
: Infact we unjared lucene and jared it with our jar and are running from the
: same nfs mounts on both the machines

i didn't do an indepth code read, but a quick skim of 
StandardTokenizerImpl didn't turn up any questionale uses of APIs that 
might have differnet behavior depending on the default locale/charset of 
the JVM running it ... everthing is simple char or String based access.

Are you *certain* that you are providing Lucene with the Strings you think 
you are?  Is it possible that you are using a FileReader or 
InputStreamReader that rely on the default character encoding of the JVM 
(which may not be correct for the data you are reading in) ?

Can you write a simple junit test that fails on one machine and passes on 
the other?  If so i'd love to see that test along with the output of this 
code...

  java.util.Enumeration e = System.getProperties().propertyNames();
  while(e.hasMoreElements()) {
    String prop = (String)e.nextElement();
    System.out.println(prop + " = " + java.net.URLEncoder.encode(System.getProperty(prop),
"US-ASCII"));
  }


-Hoss

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message