lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Spencer <dave-lucene-u...@tropo.com>
Subject Re: Existing Parsers
Date Thu, 09 Sep 2004 16:54:43 GMT
Honey George wrote:

> Hi,
>   I know some of them.
> 1. PDF
>  + http://www.pdfbox.org/
>  + http://www.foolabs.com/xpdf/download.html
>    - I am using this and found good. It even supports 

My dated experience from 2 years ago was that (the evil, native code) 
foolabs pdf parser was the best, but obviously things could have changed.

http://www.mail-archive.com/lucene-user@jakarta.apache.org/msg02912.html

>      various languages.
> 2. word
>   + http://sourceforge.net/projects/wvware
> 3. excel
>   + http://www.jguru.com/faq/view.jsp?EID=1074230
> 
> -George
>  --- dhatcher@webtads.com wrote: 
> 
>>Anyone know of any reliable parsers out there for
>>pdf word 
>>excel or powerpoint?

For powerpoint it's not easy. I've been using this and it has worked 
fine util recently and seems to sometimes go into an infinite loop now 
on some recent PPTs. Native code and a package that seems to be dormant 
but to some extent it does the job. The file "ppthtml" does the work.

http://chicago.sourceforge.net/xlhtml

>>
>>
> 
> ---------------------------------------------------------------------
> 
>>To unsubscribe, e-mail:
>>lucene-user-unsubscribe@jakarta.apache.org
>>For additional commands, e-mail:
>>lucene-user-help@jakarta.apache.org
>>
>> 
> 
> 
> 
> 	
> 	
> 		
> ___________________________________________________________ALL-NEW Yahoo! Messenger -
all new features - even more fun!  http://uk.messenger.yahoo.com
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message