jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marcel Reutegger <marcel.reuteg...@gmx.net>
Subject Re: MalformedInputException on Linux with MsPowerPointTextExtractor
Date Tue, 04 Nov 2008 13:01:16 GMT
Hi Justin,

I guess this is a bug. can you please file a jira issue and reveal more of the
stack trace? it does not show what part of jackrabbit causes the exception.



Justin Grunau wrote:
> Jackrabbit text extractors return Readers from their extractText methods.
> In the case of PowerPoint files, I am finding that on Linux alone, I get the following
exception stack trace when I attempt to read anything from the Reader 
> returns from the MsPowerPointTextExtractor.extractText method:
> sun.io.MalformedInputException
>         at sun.io.ByteToCharUTF8.convert(ByteToCharUTF8.java:262)
>         at sun.nio.cs.StreamDecoder$ConverterSD.convertInto(StreamDecoder.java:314)
>         at sun.nio.cs.StreamDecoder$ConverterSD.implRead(StreamDecoder.java:345)
>         at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:250)
>         at sun.nio.cs.StreamDecoder.read0(StreamDecoder.java:199)
>         at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:185)
>         at java.io.InputStreamReader.read(InputStreamReader.java:196)
> Of course I have no control over what encoding any PowerPoint documents happen to be
in (nor can I determine the encoding without using some sort of parser to read the file).
 I also know of no way to tell an InputStreamReader what encoding to convert into.  It simply
appears that whatever the default encoding of the operating system is (in this case, UTF8)
will be used.
> As of now, I have no way to reliably use the Jackrabbit MsPowerPointTextExtractor on
Linux at all -- it works fine for me on Windows.  Any suggestions?

View raw message