lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kudrettin Güleryüz <kudret...@gmail.com>
Subject Re: utf-8 issues depending on host
Date Tue, 23 May 2017 19:13:14 GMT
I create the object as new FileReader(file)
Where file is read from File.listFiles() as below:
cwd.listFiles(getSourceCodeFilter())
File file : files

FileReader doesn't seem to have a constructor that lets me specify an
encoding, and in fact I feel like I should not be setting it to UTF-8 by
default, anyways.

Let me revise my question, how can I make sure all hosts running this
indexer code behave as expected? It certainly runs as expected on one
machine while not on others. One that runs as expected is Debian 8.3 others
are Debian 7.4.

Thank you

On Tue, May 23, 2017 at 10:45 AM Adrien Grand <jpountz@gmail.com> wrote:

> The issue is likely due to how you create the FileReader that you pass to
> TextField. Maybe you don't give it the right encoding?
>
> Le mar. 23 mai 2017 à 16:38, Kudrettin Güleryüz <kudrettin@gmail.com> a
> écrit :
>
> > Hi,
> >
> > Depending on the host running indexer, UTF-8 characters are not stored
> (not
> > correctly, anyways) in Lucene index.
> >
> > Interestingly, locale output is identical on all hosts but the output is
> > different.
> >
> > Apparently using FileReader could be the culprit.  I am currently using
> > TextField(String name, Reader reader)
> >
> > How can I improve this? What is the suggested way for handling this using
> > 5.2.1? TextField(String name, String value, Store store)?
> >
> > Thank you
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message