lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <...@thetaphi.de>
Subject RE: file formats: MacRoman and UTF-8...
Date Mon, 28 Mar 2011 07:30:39 GMT
Hi,

Replace the "stupid":
writer = new BufferedWriter(new FileWriter(fileOutput));

by:
writer = new BufferedWriter(new OutputStreamWriter(new
FileOutputStream(fileOutput), "UTF-8"));

Unfortunately, you cannot give a charset to FileWriter itself.

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: Patrick Diviacco [mailto:patrick.diviacco@gmail.com]
> Sent: Monday, March 28, 2011 9:17 AM
> To: java-user@lucene.apache.org
> Subject: Re: file formats: MacRoman and UTF-8...
> 
> hi, I'm using my own code:
> 
> 
> 
> Writer writer = null;
> 
> try {
> //File fileOutput = new File("output.trectext"); File fileOutput = new
> File(args[1]); writer = new BufferedWriter(new FileWriter(fileOutput));
> writer.write(contents.toString());
> } catch (FileNotFoundException e) {
> e.printStackTrace();
> } catch (IOException e) {
> e.printStackTrace();
> } finally {
> try {
> if (writer != null) {
> writer.close();
> }
> } catch (IOException e) {
> e.printStackTrace();
> }
> }
> 
> 
> 
> 
> 
> On 28 March 2011 09:13, Paul Libbrecht <paul@hoplahup.net> wrote:
> 
> > java -Dfile.encoding=utf-8
> > should do the trick.
> >
> > Or... which java app are you using?
> >
> > paul
> >
> >
> > Le 28 mars 2011 à 09:03, Patrick Diviacco a écrit :
> >
> > > When I run my Lucene app and a parse a xml file I get the following
> > > error due to some fonts such as "é" written in the text file.
> > >
> > > If I save the text file as UTF-8 with my text editor I don't have
> > > this issue, but when I create it with a java app, it is saved as
MacRoman.
> > >
> > > How can I specify a different format with Java instead ?
> > >
> > > thanks
> > >
> > > [CODE]Exception in thread "main"
> > >
> >
> com.sun.org.apache.xerces.internal.impl.io.MalformedByteSequenceExcepti
> on:
> > > Invalid byte 1 of 1-byte UTF-8 sequence.
> > > at
> > >
> > com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.invalidByte(UTF8
> > Reader.java:684)
> > > at
> > >
> > com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.read(UTF8Reader.
> > java:554)
> > > at
> > >
> > com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.load(XMLEntit
> > yScanner.java:1742)
> > > at
> > >
> > com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.skipChar(XMLE
> > ntityScanner.java:1416)
> > > at
> > >
> >
> com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerIm
> pl
> >
> $FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:279
> 2)
> > > at
> > >
> >
> com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(X
> M
> > LDocumentScannerImpl.java:648)
> > > at
> > >
> >
> com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerIm
> pl
> > .scanDocument(XMLDocumentFragmentScannerImpl.java:511)
> > > at
> > >
> >
> com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XM
> > L11Configuration.java:808)
> > > at
> > >
> >
> com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XM
> > L11Configuration.java:737)
> > > at
> > >
> > com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.j
> > ava:119)
> > > at
> > >
> > com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(Abs
> > tractSAXParser.java:1205)
> > > at
> > >
> >
> com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.pa
> > rse(SAXParserImpl.java:522)
> > > at org.apache.commons.digester.Digester.parse(Digester.java:1871)
> > > at CollectionIndexer.main(CollectionIndexer.java:111)[/CODE]
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message