hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HDFS-1058) reading from file under construction fails if it reader beats writer to DN for new block
Date Tue, 23 Mar 2010 06:20:27 GMT

    [ https://issues.apache.org/jira/browse/HDFS-1058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12848561#action_12848561
] 

Todd Lipcon commented on HDFS-1058:
-----------------------------------

I think the fix for this is simple enough.

If a client calls getReplicaVisibleLength on a DN, and the DN replies with a ReplicaNotFoundException,
then either:
a) the client just hasn't started writing the block yet,
or b) the client has stale block locations

In order to separate (a) from (b) I think we can iterate through the DNs, and see if all of
the DNs have the same response. If they do, and the LocatedBlocks indicated a 0 length, then
I think we can safely just act the same as if they returned length 0. Stale block locations
are impossible (or at least very very unlikely) since we *just* called getBlockLocations from
the NN.

My only question is whether we need to actually iterate through all of the DNs, or if we can
just return 0 immediately on receiving ReplicaNotFoundException from the primary. I think
going through all is actually important, because there may have been a concurrent pipeline
recovery, in which case the old primary may have deleted the now-invalidated replica. (ie
we can't distinguish between not having gotten the block yet and having gotten and already
deleted the block)

Does this sound right? I will work on a patch if it does.

> reading from file under construction fails if it reader beats writer to DN for new block
> ----------------------------------------------------------------------------------------
>
>                 Key: HDFS-1058
>                 URL: https://issues.apache.org/jira/browse/HDFS-1058
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: data-node, hdfs client
>    Affects Versions: 0.21.0, 0.22.0
>            Reporter: Todd Lipcon
>
> If there is a writer and concurrent reader, the following can occur:
> - The writer allocates a new block from the NN
> - The reader calls getBlockLocations
> - Reader connects to the DN and calls getReplicaVisibleLength
> - writer still has not talked to the DN, so DN doesn't know about the block and throws
an error

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message