hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HDFS-1058) reading from file under construction fails if it reader beats writer to DN for new block
Date Tue, 23 Mar 2010 06:20:27 GMT

    [ https://issues.apache.org/jira/browse/HDFS-1058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12848561#action_12848561

Todd Lipcon commented on HDFS-1058:

I think the fix for this is simple enough.

If a client calls getReplicaVisibleLength on a DN, and the DN replies with a ReplicaNotFoundException,
then either:
a) the client just hasn't started writing the block yet,
or b) the client has stale block locations

In order to separate (a) from (b) I think we can iterate through the DNs, and see if all of
the DNs have the same response. If they do, and the LocatedBlocks indicated a 0 length, then
I think we can safely just act the same as if they returned length 0. Stale block locations
are impossible (or at least very very unlikely) since we *just* called getBlockLocations from
the NN.

My only question is whether we need to actually iterate through all of the DNs, or if we can
just return 0 immediately on receiving ReplicaNotFoundException from the primary. I think
going through all is actually important, because there may have been a concurrent pipeline
recovery, in which case the old primary may have deleted the now-invalidated replica. (ie
we can't distinguish between not having gotten the block yet and having gotten and already
deleted the block)

Does this sound right? I will work on a patch if it does.

> reading from file under construction fails if it reader beats writer to DN for new block
> ----------------------------------------------------------------------------------------
>                 Key: HDFS-1058
>                 URL: https://issues.apache.org/jira/browse/HDFS-1058
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: data-node, hdfs client
>    Affects Versions: 0.21.0, 0.22.0
>            Reporter: Todd Lipcon
> If there is a writer and concurrent reader, the following can occur:
> - The writer allocates a new block from the NN
> - The reader calls getBlockLocations
> - Reader connects to the DN and calls getReplicaVisibleLength
> - writer still has not talked to the DN, so DN doesn't know about the block and throws
an error

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message