cxf-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Pieper, Aaron" <Piep...@Pragmatics.com>
Subject Possible bug in MimeBodyPartInputStream - should I report this?
Date Thu, 14 Oct 2010 22:56:25 GMT
I'm using CXF 2.2.10. I'm having a problem with some MTOM attachments. It started when I upgraded
from CXF 2.2.2 to CXF 2.2.3. The bug is that after calling a service which returned an MTOM
attachment, when I try to parse the attachment, I sometimes get an error:

java.io.IOException: Underlying input stream returned zero bytes
	at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:268)
	at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:306)
	at sun.nio.cs.StreamDecoder.READ(StreamDecoder.java:158)
	at java.io.InputStreamReader.READ(InputStreamReader.java:167)
	at java.io.Reader.READ(Reader.java:123)
	at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:1128)
	at org.apache.commons.io.IOUtils.copy(IOUtils.java:1104)
	at org.apache.commons.io.IOUtils.copy(IOUtils.java:1050)
	at org.apache.commons.io.IOUtils.toString(IOUtils.java:359)
	at com.pragmatics.AsyncUtils.messageToString(AsyncUtils.java:18)
 
The error only happens for some attachments - about 25% of them. It's a seemingly arbitrary
25% - it's not like, the biggest 25% or the ones that have special characters. I was able
to track this down to MimeBodyPartInputStream. MimeBodypartInputStream has some logic in processBuffer
for reading the boundary. It goes like this:

while ((boundaryIndex < boundary.length) && (value == boundary[boundaryIndex]))
{
 if (!hasData(buffer, initialI, i + 1, off, len)) {
  return initialI - off;
 }
 value = buffer[++i];
 boundaryIndex++;
}

So, basically, when MimeBodyPartInputStream finds the start of a boundary, it reads from the
stream until either there's no more characters to read, or until it read the entire boundary.
The problem with this logic is that it assumes the entire boundary will be read in the same
call to the underlying InputStream. This assumption isn't always true. Specifically, when
I'm fetching an attachment in my application, this MimeBodyPartInputStream is backed by an
HttpURLConnection.HttpInputStream. This HttpInputStream sometimes fetches as few as 24 characters,
I guess that's just how the HttpInputStream works. But if these 24 characters happen to fall
on one of these MIME boundaries, it can cause problems.

One problem, which I'm running into here, is that the MimeBodyPartInputStream's read(byte,int,int)
method returns 0, since the only bytes that were read were parts of the MIME boundary. In
returning 0, it breaks InputStream's contract which says states that the read method will
only ever return a positive integer (if some bytes were read) or -1 (if no bytes were read.)
There are probably other possible problems - it seems like it's possible MimeBodyPartInputStream
might misunderstand whether or not it's hit a boundary in some cases. I haven't run into that
problem though.

I was hesitant to submit a tracker for this issue, since I don't 100% understand all of the
pieces involved. Since the bug is dependent on HttpInputStream, I haven't really been able
to create a test case for it, unless I do weird things like create my own InputStream class
which behaves in weird ways. Should I submit it anyway? It fortunately only causes problems
in my test code, but it seems like an important issue.
 
- Aaron

Mime
View raw message