poi-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dmitry Goldenberg" <dmitry.goldenb...@weblayers.com>
Subject RE: reading MS word file
Date Fri, 18 May 2007 14:39:37 GMT
http://textmining.org/

________________________________

From: Henry Lu [mailto:zhlu@umich.edu]
Sent: Fri 5/18/2007 7:22 AM
To: POI Users List
Subject: Re: reading MS word file



Where do i download the

org.textmining package?

-Henry


Dmitry Goldenberg wrote:
> Henry,
> 
> There are a few things you can try.
> 
> 1. Take a look at org.textmining's Word text extractor:
>
> org.textmining.text.extraction.WordExtractor
>
> All you have to do is this:
>
> new WordExtractor().extractText(inputStream)
>
> 2. There is also the POI extractor:
>
> org.apache.poi.hdf.extractor.WordDocument
>
> All you do is:
>
> WordDocument wd = new WordDocument(is);
> StringWriter docTextWriter = new StringWriter();
> wd.writeAllText(new PrintWriter(docTextWriter));
> docTextWriter.close();
> text = docTextWriter.toString();
>
> 3. I'd also check out the following:
>
> org.semanticdesktop.aperture.extractor.word.WordExtractor
>
> here: http://aperture.sourceforge.net/doc/javadoc/index.html
>
> Hope this helps,
> - Dmitry
>
>
> ________________________________
>
> From: Henry Lu [mailto:zhlu@umich.edu]
> Sent: Thu 5/17/2007 1:19 PM
> To: poi-user@jakarta.apache.org
> Subject: reading MS word file
>
>
>
> Is there an example/code  to read a MS Word file for text line by line.
> All I am interested in is the text regardless format, style, font...
>
> -Henry
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
> Mailing List:     http://jakarta.apache.org/site/mail2.html#poi
> The Apache Jakarta Poi Project:  http://jakarta.apache.org/poi/
>
>
>
>
>  

---------------------------------------------------------------------
To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
Mailing List:     http://jakarta.apache.org/site/mail2.html#poi
The Apache Jakarta Poi Project:  http://jakarta.apache.org/poi/





Mime
View raw message