lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <...@thetaphi.de>
Subject RE: utf-8 issues depending on host
Date Tue, 23 May 2017 20:07:26 GMT
Hi,

FileReader is a broken class, this is well-known. For that reason it is part of the forbidden-apis
lis, which is also used by Lucene to prevent issues like your in our source code. To correctly
specify the characterset for reading a file, you have to use an FileInputStream and wrap it
with an InputStreamReader. On the InputStreamReader you can give the charset.

See https://github.com/policeman-tools/forbidden-apis

Uwe

-----
Uwe Schindler
Achterdiek 19, D-28357 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de

> -----Original Message-----
> From: Kudrettin Güleryüz [mailto:kudrettin@gmail.com]
> Sent: Tuesday, May 23, 2017 9:13 PM
> To: java-user@lucene.apache.org
> Subject: Re: utf-8 issues depending on host
> 
> I create the object as new FileReader(file)
> Where file is read from File.listFiles() as below:
> cwd.listFiles(getSourceCodeFilter())
> File file : files
> 
> FileReader doesn't seem to have a constructor that lets me specify an
> encoding, and in fact I feel like I should not be setting it to UTF-8 by
> default, anyways.
> 
> Let me revise my question, how can I make sure all hosts running this
> indexer code behave as expected? It certainly runs as expected on one
> machine while not on others. One that runs as expected is Debian 8.3 others
> are Debian 7.4.
> 
> Thank you
> 
> On Tue, May 23, 2017 at 10:45 AM Adrien Grand <jpountz@gmail.com>
> wrote:
> 
> > The issue is likely due to how you create the FileReader that you pass to
> > TextField. Maybe you don't give it the right encoding?
> >
> > Le mar. 23 mai 2017 à 16:38, Kudrettin Güleryüz <kudrettin@gmail.com> a
> > écrit :
> >
> > > Hi,
> > >
> > > Depending on the host running indexer, UTF-8 characters are not stored
> > (not
> > > correctly, anyways) in Lucene index.
> > >
> > > Interestingly, locale output is identical on all hosts but the output is
> > > different.
> > >
> > > Apparently using FileReader could be the culprit.  I am currently using
> > > TextField(String name, Reader reader)
> > >
> > > How can I improve this? What is the suggested way for handling this using
> > > 5.2.1? TextField(String name, String value, Store store)?
> > >
> > > Thank you
> > >
> >


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message