poi-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Brainard <jbrain...@glynlyon.com>
Subject UTF-8 Encoding
Date Fri, 08 Sep 2017 15:50:52 GMT
I’m using JXLS to generate a report in Excel and am having a hard time with non-ASCII text,
such as the following: 

𝑦 = π‘šπ‘₯ + 𝑏, 𝐴π‘₯ + 𝐡𝑦 = 𝐢, and 𝑦 - 𝑦₁ = π‘š(π‘₯ - π‘₯₁)

The above is rendered to the sharedStrings.xml file as:

<sst count="1" uniqueCount="1" xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main"><si><t>??
= ???? + ??, ???? + ???? = ??, and ?? - ??₁ = ??(?? - ??₁)</t></si></sst>

I believe I’ve narrowed it down to org.openxmlformats.schemas.spreadsheetml.x2006.main.CTRst.
My testing shows that it’s storing the string correctly internally, but when writing to
the sharedStrings.xml, the text isn’t being handled correctly. I’m not sure if this is
something I’m doing wrong, or if this is a bug somewhere in POI or XmlBeans. I don’t believe
the issue is in the JXLS library as I’ve isolated the issue to the code below:

	String text = "𝑦 = π‘šπ‘₯ + 𝑏, 𝐴π‘₯ + 𝐡𝑦 = 𝐢, and 𝑦 - 𝑦₁ = π‘š(π‘₯
- π‘₯₁)";
	SharedStringsTable table = new SharedStringsTable();
	CTRst st = CTRst.Factory.newInstance();
	st.setT(text);
	table.addEntry(st);

	ByteArrayOutputStream baos = new ByteArrayOutputStream();
	table.writeTo(baos);
	String output = baos.toString("UTF-8");

	// This assertion passes
	Assert.assertEquals(st.getT(), text);

	// This assertion fails
	Assert.assertEquals(output, "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n" +
			"<sst count=\"1\" uniqueCount=\"1\" xmlns=\"http://schemas.openxmlformats.org/spreadsheetml/2006/main\"><si><t>𝑦
= π‘šπ‘₯ + 𝑏, 𝐴π‘₯ + 𝐡𝑦 = 𝐢, and 𝑦 - 𝑦₁ = π‘š(π‘₯ - π‘₯₁)</t></si></sst>");


Here’s another snippet which reproduces the issue I’m having with creating a xlsx workbook:

	XSSFWorkbook workbook = new XSSFWorkbook();
	XSSFSheet sheet = workbook.createSheet();

	Row row = sheet.createRow(0);
	Cell cell = row.createCell(0);
	cell.setCellValue(TEXT);

	FileOutputStream outputStream = new FileOutputStream(FILE_NAME);
	workbook.write(outputStream);
	workbook.close();


I’m assuming it’s something I’m doing wrong, but have been unable to find a solution.
I created a github repo with the above code in hopes that it aids in finding a solution.

https://github.com/JohnBrainard/poi-utf8-debugging

Thank you for your help!

John


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org
Mime
View raw message