poi-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rainer Schwarze <...@admadic.de>
Subject Re: Problem with word documents
Date Thu, 29 Nov 2007 12:19:40 GMT
chris.b wrote:
> i found a simpler way round this, i read somewhere that open office simply
> converted .rtf files to .doc, so i just handle it using the rtf handler, but
> i wanted to do it so it would try handling it with poi, in case it catches a
> IllegalPropertySetDataException, to handle it with rtf handler, but it
> always gives me the exception...

The rtf handler should not be able to read  a Word document. Word can
pretend that an RTF is a Word file while actually it is still an RTF,
but if it is a Word file and not RTF, then RTF readers cannot handle it.

What do you mean by "always gives me the exception"? Do all your files
throw this exception or is the exception propagated out of the try/catch
range? (If all your files are created using open office, all your files
are likely to throw that exception if one is doing that.) BTW: which
version of OpenOffice was used to create the files?

> Example of what i'm trying to do:
> I don't know if there's any "bad programming in this", for now i just wanted
> it to work :p

This is what I tried:
(The declaration of fisdoc was not included in your copied code - I
assume you created a second FileInputStream...)

public static void main(String[] args)
throws IOException, BadLocationException {
  String content = null;
  File f = new File("monte.doc");
  FileInputStream docfin = new FileInputStream(f.getAbsolutePath());
  try {
    WordExtractor docextractor = new WordExtractor(docfin);
    content = docextractor.getText();
  } catch (IllegalPropertySetDataException e) {
    try {
      System.out.println("exc caught");
      FileInputStream fisdoc = new FileInputStream(f.getAbsolutePath());
      DefaultStyledDocument styledDoc = new DefaultStyledDocument();
      new RTFEditorKit().read(fisdoc, styledDoc, 0);
        "styledDoc.getLength() = " + styledDoc.getLength());
      content = styledDoc.getText(0, styledDoc.getLength());
    } catch (Throwable t) {
      System.out.println("exc2 caught");
  System.out.println("[" + content + "]");

It prints this:

exc caught
styledDoc.getLength() = 0

I assume that the RTFReader (used in the RTFEditorKit) only encounters
binary data and never hits a keyword which it interprets as RTF. So it
does not read any real content from the file. (I would have expected
that it throws an exception - which is why I wanted to see what happens
by myself :-) )

Best wishes,

To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org

View raw message