jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Nuescheler <david.nuesche...@gmail.com>
Subject Re: Text filters for binary documents
Date Wed, 01 Jun 2005 01:34:18 GMT
hi jan,

thanks a lot for the contribution. this sounds very interesting.
we will have a look at that asap.


On 5/27/05, Ján Halaša <halasalist@aura.cz> wrote:
> Hi everybody,
> I have converted some text filters (for extracting text content from
> binary files) from Jakarta Slide project so that they implement
> TextFilter interface. Slide sources share the same Apache 2.0 license.
> I used the existing TextPlainTextFilter class as a template, so they do
> not accept multi-valued properties.
> I'll be glad if someone take a look and integrates them with Jackrabbit
> somehow.
> http://www.halasa.com/jackrabbit/ApplicationMsExcelTextFilter.java
> http://www.halasa.com/jackrabbit/ApplicationMsWordTextFilter.java
> http://www.halasa.com/jackrabbit/ApplicationPdfTextFilter.java
> You will need these extra libraries:
> PDFBox-0.7.1.jar (http://www.pdfbox.org/)
> poi-2.5.1-final-20040804.jar (http://jakarta.apache.org/poi/)
> tm-extractors-0.4.jar (http://www.textmining.org/)
> Jan

standardize your content-repository !
---------------------------------------< david.nuescheler@day.com >---

This message is a private communication. If you are not the intended
recipient, please do not read, copy, or use it, and do not disclose it
to others. Please notify the sender of the delivery error by replying
to this message, and then delete it from your system. Thank you.

The sender does not assume any liability for timely, trouble free,
complete, virus free, secure, error free or uninterrupted arrival of
this e-mail. For verification please request a hard copy version.


David Nuescheler
Chief Technology Officer
Day Software AG
Barfuesserplatz 6 / Postfach
4001 Basel

T  41 61 226 98 98
F  41 61 226 98 97
View raw message