hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Raghu Angadi <rang...@yahoo-inc.com>
Subject Re: Problem with InputStream.skip()
Date Fri, 25 May 2007 22:56:58 GMT

Also, reading from block supports 'real skip', ie, it does not check 
checksum if an entire checksum block (usually 512 bytes) falls within 
the skip range. Another reason to implement our own skip.

Raghu Angadi wrote:
> In Hadoop, whenever possible, we read directly to user buffer. E.g. in 
> ChecksumFileSystem we read into user buffer and then do a checksum, I do 
> the same in new Block level CRCs. This is very useful since this avoids 
> an extra copy in most cases.
> 
> We don't define skip() for our extensions of InputStream since we know 
> default implementation calls read(). But the problem is that 
> InputStream.skip() uses a *static* byte buffer (from its perspective, it 
>  makes sense). So if we have two parallel skip() on unrelated streams, 
> we will surely get checksum errors.
> 
> When this happened with Block level CRCs, I wasted time trying to find a 
> bug in the new code.
> 
> My prefered fix would be to implement skip() in Hadoop() level. Always 
> copying to user buffer would be very defensive fix.
> 
> Raghu.


Mime
View raw message