poi-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Raghu Kaippully" <raghu...@gmail.com>
Subject Re: Is POI-HWPF really the best way to extract text from Word files?
Date Sat, 15 Mar 2008 03:43:08 GMT
All the HWPF code is under the scratchpad section in subversion repository -
http://svn.apache.org/viewvc/poi/trunk/src/scratchpad/

For a simple example, have a look at
testcases/org/apache/poi/hwpf/usermodel/TestProblems.java under the
scratchpad. It has a method testRangeDelete() that scans the text pieces.

-Raghu

On Fri, Mar 14, 2008 at 9:46 PM, Ylva Degerfeldt <ylva.degerfeldt@gmail.com>
wrote:

> Yes, I'm only interested in extracting the text (more specifically
> searching for different keywords in cv's in Word format).
>
> Where can I find those JUnit testcases? (I'm new to this whole thing.)
>
> /Ylva
>
> On Fri, Mar 14, 2008 at 4:59 PM, Raghu Kaippully <raghu.kb@gmail.com>
> wrote:
> > Are you just looking to extract text from word documents? Then HWPF
> probably
> >  will do the trick. I am not familiar with Clean Content SDK so can't
> comment
> >  on that. Why don't you give HWPF a try. Some of the JUnit testcases
> already
> >  operate on extracting text, may be you can have a look at them.
> >
> >  -Raghu
> >
> >  On Fri, Mar 14, 2008 at 9:15 PM, Ylva Degerfeldt <
> ylva.degerfeldt@gmail.com>
> >  wrote:
> >
> >
> >
> >  > Hi everyone,
> >  >
> >  > Maybe I shouldn't ask this on this mailing list but I'm about to
> start
> >  > on a project where I'm going to extract different keywords from Word
> >  > files in the most common formats (like 97 - 2003) and I'd like to
> know
> >  > before I start if using POI-HWPF really is the best way to do that.
> >  >
> >  > The thing is.. I think I have found another way to do it: Oracle's
> >  > Clean Content SDK. Has anyone tried this? I was just wondering if
> it's
> >  > worth the time and effort to dig deeper into that or if I should
> >  > simply decide that POI-HWPF is the best solution and forget about the
> >  > other one. (I have a bit of a tight schedule so that's why I'm
> >  > asking.)
> >  >
> >  > Thanks in advance,
> >  >
> >  > Ylva
> >  >
> >  > ---------------------------------------------------------------------
> >  > To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
> >  > For additional commands, e-mail: user-help@poi.apache.org
> >  >
> >  >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
> For additional commands, e-mail: user-help@poi.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message