poi-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From bugzi...@apache.org
Subject [Bug 58858] New: hidden characters not removed
Date Thu, 14 Jan 2016 09:15:35 GMT
https://bz.apache.org/bugzilla/show_bug.cgi?id=58858

            Bug ID: 58858
           Summary: hidden characters not removed
           Product: POI
           Version: unspecified
          Hardware: PC
            Status: NEW
          Severity: critical
          Priority: P2
         Component: HWPF
          Assignee: dev@poi.apache.org
          Reporter: sebastian.a.aguirre@gmail.com

Created attachment 33442
  --> https://bz.apache.org/bugzilla/attachment.cgi?id=33442&action=edit
sample doc file to test

After reading the file and turning it into a String the hidden characters are
not removed.
This happens in XWPF as well.

For reading the file I'm using a very simple method.

File file = new File("file.doc");
FileInputStream fis;
fis = new FileInputStream(file);
HWPFDocument doc = new HWPFDocument(fis);
WordExtractor ex = new WordExtractor(doc);
String toReturn = ex.getText();

Same thing happens when using XWPF, very simple code.

XWPFDocument doc = new XWPFDocument(fis);
XWPFWordExtractor ex = new XWPFWordExtractor(doc);
String toReturn = ex.getText();

I'm attaching a file you can use as sample.
You can show/hide the hidden characters with ctrl+shift+8

Thanks.

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


Mime
View raw message