lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Hatcher <e...@ehatchersolutions.com>
Subject Re: RTF text extractor ?
Date Tue, 19 Apr 2005 12:28:04 GMT

On Apr 19, 2005, at 7:37 AM, Eric Chow wrote:

> Hello,
>
> Is there any RTF text extractor for Lucene ?

You can use some Swing classes to do this.  This is from the Lucene in 
Action code (http://www.lucenebook.com/search?query=rtf)

   public Document getDocument(InputStream is)
     throws DocumentHandlerException {

     String bodyText = null;

     DefaultStyledDocument styledDoc = new DefaultStyledDocument();
     try {
       new RTFEditorKit().read(is, styledDoc, 0);
       bodyText = styledDoc.getText(0, styledDoc.getLength());
     }
     catch (IOException e) {
       throw new DocumentHandlerException(
         "Cannot extract text from a RTF document", e);
     }
     catch (BadLocationException e) {
       throw new DocumentHandlerException(
         "Cannot extract text from a RTF document", e);
     }

     if (bodyText != null) {
       Document doc = new Document();
       doc.add(Field.UnStored("body", bodyText));
       return doc;
     }
     return null;
   }


Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message