Return-Path: Delivered-To: apmail-poi-user-archive@www.apache.org Received: (qmail 28692 invoked from network); 11 Jan 2010 14:31:52 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 11 Jan 2010 14:31:52 -0000 Received: (qmail 12324 invoked by uid 500); 11 Jan 2010 14:31:52 -0000 Delivered-To: apmail-poi-user-archive@poi.apache.org Received: (qmail 12303 invoked by uid 500); 11 Jan 2010 14:31:52 -0000 Mailing-List: contact user-help@poi.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: "POI Users List" Delivered-To: mailing list user@poi.apache.org Received: (qmail 12286 invoked by uid 99); 11 Jan 2010 14:31:52 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 11 Jan 2010 14:31:52 +0000 X-ASF-Spam-Status: No, hits=-3.3 required=10.0 tests=RCVD_IN_DNSWL_MED,SPF_SOFTFAIL X-Spam-Check-By: apache.org Received-SPF: softfail (athena.apache.org: transitioning domain of nick.burch@alfresco.com does not designate 193.201.200.73 as permitted sender) Received: from [193.201.200.73] (HELO urchin.earth.li) (193.201.200.73) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 11 Jan 2010 14:31:45 +0000 Received: from nick (helo=localhost) by urchin.earth.li with local-esmtp (Exim 4.69) (envelope-from ) id 1NULIh-0004AK-M0 for user@poi.apache.org; Mon, 11 Jan 2010 14:31:23 +0000 Date: Mon, 11 Jan 2010 14:31:23 +0000 (GMT) From: Nick Burch X-X-Sender: nick@urchin.earth.li To: POI Users List Subject: Re: WordExtractor.getText() returns ^U on word docs. In-Reply-To: <27111308.post@talk.nabble.com> Message-ID: References: <27111308.post@talk.nabble.com> User-Agent: Alpine 1.10 (DEB 962 2008-03-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed On Mon, 11 Jan 2010, maxSchlein wrote: > It appears that when I use WordExtractor.getText(), and there are tables in > the document, it returns  for every table column. Is there a way to have > this filtered out other than looping thru the returned text. Did you try passing it through WorkExtractor.stripFields ? Nick --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscribe@poi.apache.org For additional commands, e-mail: user-help@poi.apache.org