On Sun, 29 Jan 2012, abc wrote:
> I was able to reuse XWPFWordExtractorDecorator class. But It is just giving
> me text. How to read the XHTML? Here is what I did,
You probably don't want to call that directly, instead you should be using
Tika in the normal way. This is taken from the Tika unit tests, and should
give you an idea:
StringWriter sw = new StringWriter();
SAXTransformerFactory factory = (SAXTransformerFactory)
SAXTransformerFactory.newInstance();
TransformerHandler handler = factory.newTransformerHandler();
handler.getTransformer().setOutputProperty(OutputKeys.METHOD,
"xml");
handler.getTransformer().setOutputProperty(OutputKeys.INDENT,
"yes");
handler.setResult(new StreamResult(sw));
// Try with a document containing various tables and formattings
InputStream input = new FileInputStream("file.docx");
try {
Metadata metadata = new Metadata();
parser.parse(input, handler, metadata, new ParseContext());
return new XMLResult(sw.toString(), metadata);
} finally {
input.close();
}
Nick
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org
|