hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bwolen Yang" <wbwo...@gmail.com>
Subject Re: \r\n problem in LineRecordReader.java
Date Wed, 13 Jun 2007 05:29:26 GMT
blah.  forgot to mention the code came from
   ChecksumFileSystem.FSInputChecker.read()


On 6/12/07, Bwolen Yang <wbwolen@gmail.com> wrote:
> Here is probably the cause of this bug:
>
>     public int read(byte b[], int off, int len) throws IOException {
>       // make sure that it ends at a checksum boundary
>       long curPos = getPos();
>       long endPos = len+curPos/bytesPerSum*bytesPerSum;
>       return readBuffer(b, off, (int)(endPos-curPos));
>     }
>
> Here, the caller calls the function with 127 bytes, and bytesPerSum is 256.
> So, endPos-curPos became a negtaive number (e.g., -381).   So,
> readBuffer() gets called with a negative length, which naturally, the
> underlying DFSInputStream::read() returns 0.  When readBuffer() sees 0
> bytes read, it assumes end-of-file, and return -1.   The rest of the
> system treat -1 as end-of-file, and hence a large part of the input
> file is not read.
>
> I guess small read somehow get triggered by mark()/reset() which
> triggered this bug.  Any suggestions for how to fix this?
>
> btw, appended is roughly what the stack looks like on a distributed
> file system read.   Hopefully it will help the next person who wants
> to understand how read() happens.   Note that FSDataInputStream got
> wrapped twice.
>
> bwolen
>
>
> what happens on a DFS read.  note some of these are subclass call
> super class's read().  Some are a class calling its member variable's
> read().
>
>   FSDataInputStream::read() == DataInputStream::read()
>   Buffer::read()
>   BufferedInputStream::read()
>   PositionCache::read()
>   FSInputChecker::read()
>   DFSDataInputStream::read() == FSDataInputStream::read() == DataInputStream...
>   Buffer::read()
>   BufferedInputStream::read()
>   PositionCache::read()
>   DFSInputStream::read()
>   DataInputStream::read()    (blockStream)
>   BufferedInputStream::read()
>   socket
>

Mime
View raw message