poi-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <Armin.Winte...@ldbv.bayern.de>
Subject POI xlsx indeterminism
Date Mon, 01 Aug 2016 11:48:45 GMT
Dear POI dev team,

we've been experiencing an indeterminism problem with POI's xlsx format, when generating 
hash values with the following method in testng test cases:

FileTest:
@Test(enabled = true) // indeterminism at random iterations, such as 400 or 1290
    public void emptyXLSXTest() throws IOException, NoSuchAlgorithmException {
        final Hasher hasher = new HasherImpl();
        boolean differentSHA256Hash = false;
        for (int i = 0; i < 10000; i++) {
            final ByteArrayOutputStream excelAdHoc1 = BusinessPlanInMemory.getEmptyExcel("xlsx");
            final ByteArrayOutputStream excelAdHoc2 = BusinessPlanInMemory.getEmptyExcel("xlsx");
            
            byte[] expectedByteArray = excelAdHoc1.toByteArray();
	String expectedSha256 = hasher.sha256(expectedByteArray);
	byte[] actualByteArray = excelAdHoc2.toByteArray();
	String actualSha256 = hasher.sha256(actualByteArray);
			
	if (!expectedSha256.equals(actualSha256)) {
                differentSHA256Hash = true;
                System.out.println("ITERATION: " + i);
                System.out.println("EXPECTED HASH: " + expectedSha256);
                System.out.println("ACTUAL HASH: " + actualSha256);
                break;
            }
        }
        Assert.assertTrue(differentSHA256Hash, "Indeterminism did not occur");
    }


Referenced Hasher and POI code:

HasherImpl:
public String sha256(final InputStream stream) throws IOException, NoSuchAlgorithmException
{
        final MessageDigest digest = MessageDigest.getInstance("SHA-256");
        final byte[] bytesBuffer = new byte[300000]; 
        int bytesRead = -1;
        while ((bytesRead = stream.read(bytesBuffer)) != -1) {
            digest.update(bytesBuffer, 0, bytesRead);
        }
        final byte[] hashedBytes = digest.digest();
        return bytesToHex(hashedBytes);
    }


We tried to eliminate indeterminism due to meta data like creation time, to no avail:

public static ByteArrayOutputStream getEmptyExcel(final String fileextension) throws IOException
{
        Workbook wb;

        if (fileextension.equals("xls")) {
            wb = new HSSFWorkbook();
        }
        else {
            wb = new XSSFWorkbook();
            final POIXMLProperties props = ((XSSFWorkbook) wb).getProperties();
            final POIXMLProperties.CoreProperties coreProp = props.getCoreProperties();
            coreProp.setCreated("");
            coreProp.setIdentifier("1");
            coreProp.setModified("");
        }

        wb.createSheet();

        final ByteArrayOutputStream excelStream = new ByteArrayOutputStream();
        wb.write(excelStream);
        wb.close();
        return excelStream;
    }


Indeterminism occurs at random iterations, such as 400 or 1290, and we've not found out, why
this would happen, yet. 
Do you have any clue, what might be causing the problem, maybe some meta data flag we've not
been addressing, internal compression or anything else?
HSSF / ".xls" instead seems not  to have the same issues, btw.


Best regards,
Armin



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


Mime
View raw message