Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 6503 invoked from network); 14 Feb 2006 11:03:56 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 14 Feb 2006 11:03:56 -0000 Received: (qmail 23056 invoked by uid 500); 14 Feb 2006 11:03:49 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 23028 invoked by uid 500); 14 Feb 2006 11:03:48 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 23017 invoked by uid 99); 14 Feb 2006 11:03:48 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 14 Feb 2006 03:03:48 -0800 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received-SPF: pass (asf.osuosl.org: local policy) Received: from [81.187.40.70] (HELO fluffy.torchbox.com) (81.187.40.70) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 14 Feb 2006 03:03:45 -0800 Received: from grenache.internal.torchbox.com ([192.168.1.81]) by fluffy.torchbox.com with esmtp (Exim 4.50) id 1F8xxi-0004F1-Vi for java-user@lucene.apache.org; Tue, 14 Feb 2006 11:03:15 +0000 Date: Tue, 14 Feb 2006 11:03:14 +0000 (GMT) From: Nick Burch X-X-Sender: nick@localhost.localdomain To: java-user@lucene.apache.org Subject: Re: Word files & Build vs. Buy? In-Reply-To: <43EB3FA7.8060208@aduna.biz> Message-ID: References: <43EB3115.4060303@aduna.biz> <43EB3FA7.8060208@aduna.biz> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Spam-Score: -105.9 (---------------------------------------------------) X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N On Thu, 9 Feb 2006, Christiaan Fluit wrote: > Yes, that's exactly what I'm doing. Having this in POI would benefit me > a lot though, as I hardly understand the POI basics to be honest (my > fault, not POI's). OK, that's now in POI (you'll need a scratchpad build from late yesterday or today, see http://encore.torchbox.com/poi-cvs-build/ for jars) The code is in org.apache.poi.hwpf.extractor.WordExtractor, and it supports grabbing all the text, or grabbing an array of the text in each paragraph If you have any problems/queries/comments on it, then you'll probably get a better response on poi-user than here! Nick --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org