commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gary Gregory <garydgreg...@gmail.com>
Subject [IO] BOMInputStream bug?
Date Fri, 10 Aug 2012 17:44:27 GMT
Hi All:

Does anyone have expertise with BOMInputStream?

I know that some XML parsers (like the one shipped with the Oracle JRE) do
not detect UTF-32 BOMs (UTF-8 and UTF-16 BOMs are OK) but using
BOMInputStream is supposed to fix the issue.

These tests I added and @Ignore'd fail:

   -
   org.apache.commons.io.input.BOMInputStreamTest.testReadXmlWithBOMUtf32Be()
   -
   org.apache.commons.io.input.BOMInputStreamTest.testReadXmlWithBOMUtf32Le()

More basic tests do work:

   - org.apache.commons.io.input.BOMInputStreamTest.testReadWithBOMUtf32Be()
   - org.apache.commons.io.input.BOMInputStreamTest.testReadWithBOMUtf32Le()

When I look at the Oracle JRE (which uses a copy of Xerces) I see code to
deal with UCS-4, which is a precursor to UTF-32, like UCS-2 is a subset to
UTF-16, but as the test shows, Xerces fail parsing a UTF-32 document.

Any thoughts?
Thank you,
Gary

-- 
E-Mail: garydgregory@gmail.com | ggregory@apache.org
JUnit in Action, 2nd Ed: <http://goog_1249600977>http://bit.ly/ECvg0
Spring Batch in Action: <http://s.apache.org/HOq>http://bit.ly/bqpbCK
Blog: http://garygregory.wordpress.com
Home: http://garygregory.com/
Tweet! http://twitter.com/GaryGregory

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message