lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Patrick Diviacco <patrick.divia...@gmail.com>
Subject Re: file formats: MacRoman and UTF-8...
Date Mon, 28 Mar 2011 07:35:29 GMT
thanks, solved

On 28 March 2011 09:30, Uwe Schindler <uwe@thetaphi.de> wrote:

> Hi,
>
> Replace the "stupid":
> writer = new BufferedWriter(new FileWriter(fileOutput));
>
> by:
> writer = new BufferedWriter(new OutputStreamWriter(new
> FileOutputStream(fileOutput), "UTF-8"));
>
> Unfortunately, you cannot give a charset to FileWriter itself.
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
>
> > -----Original Message-----
> > From: Patrick Diviacco [mailto:patrick.diviacco@gmail.com]
> > Sent: Monday, March 28, 2011 9:17 AM
> > To: java-user@lucene.apache.org
> > Subject: Re: file formats: MacRoman and UTF-8...
> >
> > hi, I'm using my own code:
> >
> >
> >
> > Writer writer = null;
> >
> > try {
> > //File fileOutput = new File("output.trectext"); File fileOutput = new
> > File(args[1]); writer = new BufferedWriter(new FileWriter(fileOutput));
> > writer.write(contents.toString());
> > } catch (FileNotFoundException e) {
> > e.printStackTrace();
> > } catch (IOException e) {
> > e.printStackTrace();
> > } finally {
> > try {
> > if (writer != null) {
> > writer.close();
> > }
> > } catch (IOException e) {
> > e.printStackTrace();
> > }
> > }
> >
> >
> >
> >
> >
> > On 28 March 2011 09:13, Paul Libbrecht <paul@hoplahup.net> wrote:
> >
> > > java -Dfile.encoding=utf-8
> > > should do the trick.
> > >
> > > Or... which java app are you using?
> > >
> > > paul
> > >
> > >
> > > Le 28 mars 2011 à 09:03, Patrick Diviacco a écrit :
> > >
> > > > When I run my Lucene app and a parse a xml file I get the following
> > > > error due to some fonts such as "é" written in the text file.
> > > >
> > > > If I save the text file as UTF-8 with my text editor I don't have
> > > > this issue, but when I create it with a java app, it is saved as
> MacRoman.
> > > >
> > > > How can I specify a different format with Java instead ?
> > > >
> > > > thanks
> > > >
> > > > [CODE]Exception in thread "main"
> > > >
> > >
> > com.sun.org.apache.xerces.internal.impl.io.MalformedByteSequenceExcepti
> > on:
> > > > Invalid byte 1 of 1-byte UTF-8 sequence.
> > > > at
> > > >
> > > com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.invalidByte(UTF8
> > > Reader.java:684)
> > > > at
> > > >
> > > com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.read(UTF8Reader.
> > > java:554)
> > > > at
> > > >
> > > com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.load(XMLEntit
> > > yScanner.java:1742)
> > > > at
> > > >
> > > com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.skipChar(XMLE
> > > ntityScanner.java:1416)
> > > > at
> > > >
> > >
> > com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerIm
> > pl
> > >
> > $FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:279
> > 2)
> > > > at
> > > >
> > >
> > com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(X
> > M
> > > LDocumentScannerImpl.java:648)
> > > > at
> > > >
> > >
> > com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerIm
> > pl
> > > .scanDocument(XMLDocumentFragmentScannerImpl.java:511)
> > > > at
> > > >
> > >
> > com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XM
> > > L11Configuration.java:808)
> > > > at
> > > >
> > >
> > com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XM
> > > L11Configuration.java:737)
> > > > at
> > > >
> > > com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.j
> > > ava:119)
> > > > at
> > > >
> > > com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(Abs
> > > tractSAXParser.java:1205)
> > > > at
> > > >
> > >
> > com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.pa
> > > rse(SAXParserImpl.java:522)
> > > > at org.apache.commons.digester.Digester.parse(Digester.java:1871)
> > > > at CollectionIndexer.main(CollectionIndexer.java:111)[/CODE]
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > >
> > >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message