hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HDFS-1057) Concurrent readers hit ChecksumExceptions if following a writer to very end of file
Date Thu, 08 Apr 2010 17:27:36 GMT

    [ https://issues.apache.org/jira/browse/HDFS-1057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12855029#action_12855029
] 

Todd Lipcon commented on HDFS-1057:
-----------------------------------

Hey Sam,

Thanks for posting some thoughts here. I've been thinking about this one over the past couple
of weeks and don't have any particularly great solutions either :-/

bq. 1. truncate in progress blocks to chunk boundaries. This solved this problem, but fails
as technically sync'd data is not available (partial chunk at the end)

I agree this isn't very acceptable - a common use case is following an edit log, and we don't
want to lose the last 512 bytes of edits from the reader.

bq. when reader's request results in an artificial partial chunk (in BlockSender), recompute
the checksum for the partial chunk in the packet

So then we're throwing away disk checksumming on the last chunk of a file? I thought of this
one too, and I think it's OK, though it's not great. I think so long as we only do this when
the file is known to be Under Construction, it's acceptable. We'll have to be careful, though,
about races between checking whether we're reading the last checksum chunk and appending more
to the file.

bq. have datanode 'refresh' the length when it actually starts reading

Not sure I follow - as you noted it still seems to have a bug potential.


The other idea I had that isn't fully fleshed out is this:
- The DN already knows which blocks are "in progress"
- For each "in progress" block, we add a lock _updatingLastChecksumChunkLock_.
- When the writer detects that it's appending mid-checksum-chunk to the last chunk, it takes
this lock before doing the rewind and rewrite of meta.
- The writer releases the lock after writing both the checksum and the data
- When the reader is reading a range that includes the last checksum chunk of an under-construction
file, it needs to acquire this lock first,. read both the checksum and data, then release
the lock.

Couple potential issues:
- We'll have to be careful about races between the under-construction check and the read -
perhaps some more coordination with FSDataset is necessary here (we already have to coordinate
this a little bit to figure out whether to read out of the rbw directory or the finalized
directory, right?)
- We still have a potential issue about failure recovery - I think we need to add some code
to the DN recovery that cleans up the last partial chunk in RBW replicas.

One other more complicated idea:
- Next to each RBW file, in addition to having the meta/CRC32 file, we can add a blk_N.rbwMetaData
- we can keep some extra state in this file about the last partial checksum chunk, and use
it for getting consistent reads.

> Concurrent readers hit ChecksumExceptions if following a writer to very end of file
> -----------------------------------------------------------------------------------
>
>                 Key: HDFS-1057
>                 URL: https://issues.apache.org/jira/browse/HDFS-1057
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: data-node
>    Affects Versions: 0.21.0, 0.22.0
>            Reporter: Todd Lipcon
>            Priority: Critical
>
> In BlockReceiver.receivePacket, it calls replicaInfo.setBytesOnDisk before calling flush().
Therefore, if there is a concurrent reader, it's possible to race here - the reader will see
the new length while those bytes are still in the buffers of BlockReceiver. Thus the client
will potentially see checksum errors or EOFs. Additionally, the last checksum chunk of the
file is made accessible to readers even though it is not stable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message