jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marcel Reutegger <marcel.reuteg...@gmx.net>
Subject Re: MsWordTextFilter Problem
Date Wed, 17 May 2006 05:17:25 GMT
Hi Thomas,

the jackrabbit text filter for ms word documents depends on the textmining 
library and apache poi. maybe you can find some information or hints in 
those mailing lists?


thomasg wrote:
> Has anyone encoutered problems with this text filter. I am testing the text
> extraction of quite a large document (6MB worth of Thinking In Java by
> captain Bruce Eckel). Seaching    was not producing expected results. I have
> taken the Reader object generated by the MsWordTextFilter and converted it
> into a String and writen it to a file. Inspection shows that most of the
> document has been omitted. The missing part is in the middle of the file and
> there are no particularly unusal contents that mark the start of the missing
> section. I've tested larger docs that work fine so its a bit of a mystery?
> Cheers, Thomas
> --
> View this message in context: http://www.nabble.com/MsWordTextFilter-Problem-t1626136.html#a4406009
> Sent from the Jackrabbit - Dev forum at Nabble.com.

View raw message