poi-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "chris.b" <omelhornomedomu...@gmail.com>
Subject Re: Problem with word documents
Date Thu, 29 Nov 2007 09:51:24 GMT

is there any way to catch the illegalpropertysetdata exception?
it seems to me that the documents must have an rtf header, but the text
encoding is the same as a word document (don't know if this makes sense),
because using the rtf handler it reads the documents but doesn't index the
contents :s

thank you,
Chris


Rainer Schwarze wrote:
> 
> chris.b wrote:
>> here's a sample file that i wasn't able to index
>> http://www.nabble.com/file/p13972759/monte.doc monte.doc 
>> thanks for the help :)
> 
> As a last thing today I took a quick look at the file. A quick solution
> might be to skip the readProperties() call in the HWPFDocument
> constructor (don't know right now, whether the properties are really
> needed if you only read the Word file):
> 
>  public HWPFDocument(POIFSFileSystem pfilesystem) throws IOException
>   {
>     // Sort out the hpsf properties
>     filesystem = pfilesystem;
>     readProperties();    // <---- remove that one
>     ...
> 
> Depending on how much work you intend to do, you could either comment
> the line out and rebuild the library or subclass HWPFDocument and
> override readProperties() with an empty method (what I would recommend
> to try first). For the second case, you should get along by changing the
> WordExtractor constructor call in the code which you posted to:
> 
> WordExtractor docextractor = new WordExtractor(new MyHWPFDoc(docfin));
> 
> (MyHWPFDoc being a subclass of HWPFDocument with the empty
> readProperties() )
> 
> Best wishes, Rainer
> -- 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
> For additional commands, e-mail: user-help@poi.apache.org
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/Problem-with-word-documents-tf4877644.html#a14022545
Sent from the POI - User mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Mime
View raw message