poi-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rainer Schwarze <...@admadic.de>
Subject Re: Problem with word documents
Date Tue, 27 Nov 2007 23:56:43 GMT
chris.b wrote:
> here's a sample file that i wasn't able to index
> http://www.nabble.com/file/p13972759/monte.doc monte.doc 
> thanks for the help :)

As a last thing today I took a quick look at the file. A quick solution
might be to skip the readProperties() call in the HWPFDocument
constructor (don't know right now, whether the properties are really
needed if you only read the Word file):

 public HWPFDocument(POIFSFileSystem pfilesystem) throws IOException
    // Sort out the hpsf properties
    filesystem = pfilesystem;
    readProperties();    // <---- remove that one

Depending on how much work you intend to do, you could either comment
the line out and rebuild the library or subclass HWPFDocument and
override readProperties() with an empty method (what I would recommend
to try first). For the second case, you should get along by changing the
WordExtractor constructor call in the code which you posted to:

WordExtractor docextractor = new WordExtractor(new MyHWPFDoc(docfin));

(MyHWPFDoc being a subclass of HWPFDocument with the empty
readProperties() )

Best wishes, Rainer

To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org

View raw message