Return-Path: X-Original-To: apmail-poi-dev-archive@www.apache.org Delivered-To: apmail-poi-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 8069F1823F for ; Thu, 14 Jan 2016 09:15:36 +0000 (UTC) Received: (qmail 83241 invoked by uid 500); 14 Jan 2016 09:15:36 -0000 Delivered-To: apmail-poi-dev-archive@poi.apache.org Received: (qmail 83199 invoked by uid 500); 14 Jan 2016 09:15:36 -0000 Mailing-List: contact dev-help@poi.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: "POI Developers List" Delivered-To: mailing list dev@poi.apache.org Received: (qmail 83188 invoked by uid 99); 14 Jan 2016 09:15:36 -0000 Received: from mail-relay.apache.org (HELO mail-relay.apache.org) (140.211.11.15) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 14 Jan 2016 09:15:36 +0000 Received: from asf-bz1-us-mid.priv.apache.org (nat1-us-mid.apache.org [23.253.172.122]) by mail-relay.apache.org (ASF Mail Server at mail-relay.apache.org) with ESMTPS id 1DB591A0593 for ; Thu, 14 Jan 2016 09:15:36 +0000 (UTC) Received: by asf-bz1-us-mid.priv.apache.org (ASF Mail Server at asf-bz1-us-mid.priv.apache.org, from userid 33) id 7D98260B05; Thu, 14 Jan 2016 09:15:35 +0000 (UTC) From: bugzilla@apache.org To: dev@poi.apache.org Subject: [Bug 58858] New: hidden characters not removed Date: Thu, 14 Jan 2016 09:15:35 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: POI X-Bugzilla-Component: HWPF X-Bugzilla-Version: unspecified X-Bugzilla-Keywords: X-Bugzilla-Severity: critical X-Bugzilla-Who: sebastian.a.aguirre@gmail.com X-Bugzilla-Status: NEW X-Bugzilla-Priority: P2 X-Bugzilla-Assigned-To: dev@poi.apache.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version rep_platform bug_status bug_severity priority component assigned_to reporter attachments.created Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Bugzilla-URL: https://bz.apache.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 https://bz.apache.org/bugzilla/show_bug.cgi?id=58858 Bug ID: 58858 Summary: hidden characters not removed Product: POI Version: unspecified Hardware: PC Status: NEW Severity: critical Priority: P2 Component: HWPF Assignee: dev@poi.apache.org Reporter: sebastian.a.aguirre@gmail.com Created attachment 33442 --> https://bz.apache.org/bugzilla/attachment.cgi?id=33442&action=edit sample doc file to test After reading the file and turning it into a String the hidden characters are not removed. This happens in XWPF as well. For reading the file I'm using a very simple method. File file = new File("file.doc"); FileInputStream fis; fis = new FileInputStream(file); HWPFDocument doc = new HWPFDocument(fis); WordExtractor ex = new WordExtractor(doc); String toReturn = ex.getText(); Same thing happens when using XWPF, very simple code. XWPFDocument doc = new XWPFDocument(fis); XWPFWordExtractor ex = new XWPFWordExtractor(doc); String toReturn = ex.getText(); I'm attaching a file you can use as sample. You can show/hide the hidden characters with ctrl+shift+8 Thanks. -- You are receiving this mail because: You are the assignee for the bug. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org For additional commands, e-mail: dev-help@poi.apache.org