hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Raghu Angadi <rang...@yahoo-inc.com>
Subject Problem with InputStream.skip()
Date Fri, 25 May 2007 22:48:30 GMT
In Hadoop, whenever possible, we read directly to user buffer. E.g. in 
ChecksumFileSystem we read into user buffer and then do a checksum, I do 
the same in new Block level CRCs. This is very useful since this avoids 
an extra copy in most cases.

We don't define skip() for our extensions of InputStream since we know 
default implementation calls read(). But the problem is that 
InputStream.skip() uses a *static* byte buffer (from its perspective, it 
  makes sense). So if we have two parallel skip() on unrelated streams, 
we will surely get checksum errors.

When this happened with Block level CRCs, I wasted time trying to find a 
bug in the new code.

My prefered fix would be to implement skip() in Hadoop() level. Always 
copying to user buffer would be very defensive fix.


View raw message