jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jukka Zitting" <jukka.zitt...@gmail.com>
Subject Re: Restricting xpath query to document text
Date Wed, 17 May 2006 09:14:50 GMT

On 5/17/06, thomasg <thomasgascoigne@hotmail.com> wrote:
> One slight worry, have you visited www.textmining.org lately?
> Doesn't seem too healthy!

The site has been hacked since December. :-( Would it make sense to
consider alternatives? Some ideas that come to my mind:

a) Contact the Jakarta POI community for their suggestions.

b) Implement a generic text filter that pipes the binary stream
through an external application like catdoc and reads the output as
plain text to be indexed.

c) Implement a text filter that uses an OpenOffice "server" through
the UNO API to manipulate Word and other types of documents.


Jukka Zitting

Yukatan - http://yukatan.fi/ - info@yukatan.fi
Software craftsmanship, JCR consulting, and Java development
View raw message