poi-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Brainard <jbrain...@glynlyon.com>
Subject Re: UTF-8 Encoding
Date Fri, 08 Sep 2017 18:48:48 GMT
Thank you Dominik.

Using 'com.github.pjfanning:xmlbeans:2.6.2' fixes the issue.


On 9/8/17, 11:17 AM, "Dominik Stadler" <dominik.stadler@gmx.at> wrote:

    Hi,
    
    You might hit a known bug in XMLBeans, the library that POI currently uses
    for serializing the XML data, see
    https://bz.apache.org/bugzilla/show_bug.cgi?id=54084 and
    https://bz.apache.org/bugzilla/show_bug.cgi?id=59268 for quite some
    discussion on this issues. You may be able to use a beta-version of a newer
    XMLBeans version from
    https://github.com/pjfanning/xmlbeans/releases/tag/2.6.2, we would be
    interested if this also resolves your problem.
    
    Thanks... Dominik.
    
    On Fri, Sep 8, 2017 at 5:50 PM, John Brainard <jbrainard@glynlyon.com>
    wrote:
    
    > I’m using JXLS to generate a report in Excel and am having a hard time
    > with non-ASCII text, such as the following:
    >
    > 𝑦 = π‘šπ‘₯ + 𝑏, 𝐴π‘₯ + 𝐡𝑦 = 𝐢, and 𝑦 - 𝑦₁ = π‘š(π‘₯
- π‘₯₁)
    >
    > The above is rendered to the sharedStrings.xml file as:
    >
    > <sst count="1" uniqueCount="1" xmlns="http://schemas.openxmlformats.org/
    > spreadsheetml/2006/main"><si><t>?? = ???? + ??, ???? + ???? = ??,
and ??
    > - ??₁ = ??(?? - ??₁)</t></si></sst>
    >
    > I believe I’ve narrowed it down to org.openxmlformats.schemas.
    > spreadsheetml.x2006.main.CTRst. My testing shows that it’s storing the
    > string correctly internally, but when writing to the sharedStrings.xml, the
    > text isn’t being handled correctly. I’m not sure if this is something I’m
    > doing wrong, or if this is a bug somewhere in POI or XmlBeans. I don’t
    > believe the issue is in the JXLS library as I’ve isolated the issue to the
    > code below:
    >
    >         String text = "𝑦 = π‘šπ‘₯ + 𝑏, 𝐴π‘₯ + 𝐡𝑦 = 𝐢, and 𝑦
- 𝑦₁ =
    > π‘š(π‘₯ - π‘₯₁)";
    >         SharedStringsTable table = new SharedStringsTable();
    >         CTRst st = CTRst.Factory.newInstance();
    >         st.setT(text);
    >         table.addEntry(st);
    >
    >         ByteArrayOutputStream baos = new ByteArrayOutputStream();
    >         table.writeTo(baos);
    >         String output = baos.toString("UTF-8");
    >
    >         // This assertion passes
    >         Assert.assertEquals(st.getT(), text);
    >
    >         // This assertion fails
    >         Assert.assertEquals(output, "<?xml version=\"1.0\"
    > encoding=\"UTF-8\"?>\n" +
    >                         "<sst count=\"1\" uniqueCount=\"1\" xmlns=\"
    > http://schemas.openxmlformats.org/spreadsheetml/2006/main\"><si><t>𝑦
=
    > π‘šπ‘₯ + 𝑏, 𝐴π‘₯ + 𝐡𝑦 = 𝐢, and 𝑦 - 𝑦₁ = π‘š(π‘₯ - π‘₯₁)</t></si></sst>");
    >
    >
    > Here’s another snippet which reproduces the issue I’m having with creating
    > a xlsx workbook:
    >
    >         XSSFWorkbook workbook = new XSSFWorkbook();
    >         XSSFSheet sheet = workbook.createSheet();
    >
    >         Row row = sheet.createRow(0);
    >         Cell cell = row.createCell(0);
    >         cell.setCellValue(TEXT);
    >
    >         FileOutputStream outputStream = new FileOutputStream(FILE_NAME);
    >         workbook.write(outputStream);
    >         workbook.close();
    >
    >
    > I’m assuming it’s something I’m doing wrong, but have been unable to find
    > a solution. I created a github repo with the above code in hopes that it
    > aids in finding a solution.
    >
    > https://github.com/JohnBrainard/poi-utf8-debugging
    >
    > Thank you for your help!
    >
    > John
    >
    >
    > ---------------------------------------------------------------------
    > To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
    > For additional commands, e-mail: user-help@poi.apache.org
    >
    

Mime
View raw message