hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "John George (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HDFS-2034) length in getBlockRange becomes -ve when reading only from currently being written blk
Date Mon, 06 Jun 2011 16:55:59 GMT

     [ https://issues.apache.org/jira/browse/HDFS-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

John George updated HDFS-2034:

    Attachment: HDFS-2034-1.patch

> Daryn Sharp commented on HDFS-2034:
> -----------------------------------
> * I'd suggest changing {{"lastblkcomplete but read past that"}} to something 
> more grammatically correct (I'm not the police!) like {{"can't read past last 
> completed block"}}.

Good point.

> * There's an assertion for reading past the last *completed* block, but not 
> for reading past the last *located* block?

The reason that I added assert for *completed* blocks is because I was changing a previous
assumption in that code. So, I wanted to make sure that I was not breaking the logic behind
it. I should probably remove this assert.

The reason I am hesitant about adding an assert for reading past the last block is because
it has to go out of its way to calculate the length of all the blocks (which is already being
done by the caller of this function) and that does not seem to be the job of this function.

> * I found the names of the booleans to not be clear at face value.  Ie. I 
> couldn't tell what they meant if I didn't see the assignment.

On reading the booleans today, I have to say I agree with you on two of the three boolean

Changed readOnlyIncompleteBlk to readOffsetWithinCompleteBlk (now it means the opposite of
what it meant before, but it seems to help the code logic).

Changed readPastCompletedBlk to readLengthPastCompletedBlock.
I would like to continue to use lengthOfCompleteBlk instead of locatedBlocks.getFileLength()
because (and as Todd suggested in HDFS-1907), it make sense to call it something related to
completeBlocks as opposed to FileLength.

> For your consideration, maybe something like this with a sprinkling of 
> comments:
> {code}
> if (locatedBlocks.isLastBlockComplete())
>   assert(!readOffsetPastCompletedBlock) : "can't read past last completed 
> block";
> if (!readOffsetPastCompletedBlock) {
>   if (readLengthPastCompletedBlock) {
>     length = locatedBlocks.getFileLength() - offset;
>   }
>   blocks = getFinalizedBlockRange(offset, length);
> }
> if (readOffsetPastCompletedBlock) {
>   Block lastBlock = locatedBlocks.getLastLocatedBlock();
>   long lastBlockEndOffset = lastBlock.getStartOffset() + 
> lastBlock.getBlockSize();
>   assert(offset < lastBlockEndOffset) : "can't read past last located block";
>   blocks.add(lastLocatedBlock);
> }
> {code}

Thanks for your suggestions.

> length in getBlockRange becomes -ve when reading only from currently being written blk
> --------------------------------------------------------------------------------------
>                 Key: HDFS-2034
>                 URL: https://issues.apache.org/jira/browse/HDFS-2034
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: John George
>            Assignee: John George
>            Priority: Minor
>         Attachments: HDFS-2034-1.patch, HDFS-2034-1.patch, HDFS-2034.patch
> This came up during HDFS-1907. Posting an example that Todd posted in HDFS-1907 that
brought out this issue.
> {quote}
> Here's an example sequence to describe what I mean:
> 1. open file, write one and a half blocks
> 2. call hflush
> 3. another reader asks for the first byte of the second block
> {quote}
> In this case since offset is greater than the completed block length, the math in getBlockRange()
of DFSInputStreamer.java will set "length" to negative.

This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message