lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nick Burch <n...@torchbox.com>
Subject Re: http://www.textmining.org/ is "hacked"
Date Fri, 25 Nov 2005 10:15:49 GMT
On Thu, 24 Nov 2005, Guilherme Barile wrote:
> The project seems somehow abandoned

Ryan (the guy behind it) has gone to work for a firm that has the full 
word format documentation from Microsoft, so he's no longer able to 
contribute to open source projects working with word documents.

> Also if you find something else (cross platform) for extracting text 
> from word documents, please let me know

You can use POI (http://jakarta.apache.org/poi/) to extract text from word 
documents, along with your Excel and PowerPoint files.

The current word code is similar to the textmining stuff (it was also 
written by Ryan). There's snazier word support coming quite soon for POI 
(a company has paid for it, and it's getting open sourced once they sign 
off on it), but you'd have to ask on poi-user for the latest timescale on 
that.

Nick

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message