On Apr 19, 2005, at 7:37 AM, Eric Chow wrote:
> Hello,
>
> Is there any RTF text extractor for Lucene ?
You can use some Swing classes to do this. This is from the Lucene in
Action code (http://www.lucenebook.com/search?query=rtf)
public Document getDocument(InputStream is)
throws DocumentHandlerException {
String bodyText = null;
DefaultStyledDocument styledDoc = new DefaultStyledDocument();
try {
new RTFEditorKit().read(is, styledDoc, 0);
bodyText = styledDoc.getText(0, styledDoc.getLength());
}
catch (IOException e) {
throw new DocumentHandlerException(
"Cannot extract text from a RTF document", e);
}
catch (BadLocationException e) {
throw new DocumentHandlerException(
"Cannot extract text from a RTF document", e);
}
if (bodyText != null) {
Document doc = new Document();
doc.add(Field.UnStored("body", bodyText));
return doc;
}
return null;
}
Erik
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
|