lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Fraschetti <frasche...@gmail.com>
Subject Re: Existing Parsers
Date Thu, 09 Sep 2004 19:05:39 GMT
Some of the tools listed use cmd line execs to output a doc of some
sort to text and then I grab the text and add it to a lucene doc, etc
etc...

Any stats on the scalability of that? In large scale applications, I'm
assuming this will cause some serious issues... anyone have any input
on this?

-Chris Fraschetti


On Thu, 09 Sep 2004 09:54:43 -0700, David Spencer
<dave-lucene-user@tropo.com> wrote:
> Honey George wrote:
> 
> > Hi,
> >   I know some of them.
> > 1. PDF
> >  + http://www.pdfbox.org/
> >  + http://www.foolabs.com/xpdf/download.html
> >    - I am using this and found good. It even supports
> 
> My dated experience from 2 years ago was that (the evil, native code)
> foolabs pdf parser was the best, but obviously things could have changed.
> 
> http://www.mail-archive.com/lucene-user@jakarta.apache.org/msg02912.html
> 
> >      various languages.
> > 2. word
> >   + http://sourceforge.net/projects/wvware
> > 3. excel
> >   + http://www.jguru.com/faq/view.jsp?EID=1074230
> >
> > -George
> >  --- dhatcher@webtads.com wrote:
> >
> >>Anyone know of any reliable parsers out there for
> >>pdf word
> >>excel or powerpoint?
> 
> For powerpoint it's not easy. I've been using this and it has worked
> fine util recently and seems to sometimes go into an infinite loop now
> on some recent PPTs. Native code and a package that seems to be dormant
> but to some extent it does the job. The file "ppthtml" does the work.
> 
> http://chicago.sourceforge.net/xlhtml
> 
> 
> 
> >>
> >>
> >
> > ---------------------------------------------------------------------
> >
> >>To unsubscribe, e-mail:
> >>lucene-user-unsubscribe@jakarta.apache.org
> >>For additional commands, e-mail:
> >>lucene-user-help@jakarta.apache.org
> >>
> >>
> >
> >
> >
> >
> >
> >
> > ___________________________________________________________ALL-NEW Yahoo! Messenger
- all new features - even more fun!  http://uk.messenger.yahoo.com
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> > For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> >
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 
>

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message