hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Colin Patrick McCabe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-5182) BlockReaderLocal must allow zero-copy reads only when the DN believes it's valid
Date Thu, 09 Jan 2014 21:43:50 GMT

    [ https://issues.apache.org/jira/browse/HDFS-5182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13867110#comment-13867110

Colin Patrick McCabe commented on HDFS-5182:

bq. That seems much longer than necessary – don't we want clients to be able to keep mmaps
around in their cache for very long periods of time? And then, when the user requests the
read, we can "anchor" the mmap only for the duration of time for which the user holds onto
the zero-copy buffer? Once the user returns the zero-copy buffer, we can decrement the count
and allow the DN to evict the block from the cache.

Sorry, I was unclear.  When I said "closed" I mean that the user had returned the zero-copy
buffer.  So the same thing you suggested.

bq. I disagree on this. Just because you want to skip checksumming doesn't mean you can tolerate
SIGBUS. For example, many file formats have their own checksums, so we can safely skip HDFS
checksumming, but we still want to ensure that we're only reading locked (i.e safe) memory
via mmap.

What I was referring to here is where a client has specifically requested an mmap region using
the zero-copy API and the SKIP_CHECKSUMS option.  In that case, the user is clearly going
to be reading without any guarantees from us.  If the user just uses the normal (non-zero-copy,
non-mmap) read path, SIGBUS will not be an issue.

(There have been some proposals to improve the SIGBUS situation for zero-copy reads without
mlock, but they're certainly out of scope for this JIRA.)

bq. Maybe this can be put into a separate JIRA, and first implement just a very simple timeout-based
mechanism? The DN could change the anchor flag to a magic value which invalidates the segment
and then close it after some amount of time. Then if the client looks at it again it will
know to invalidate.

Timeouts and two-way protocols get complex.  I already have the code for closing the shared
memory segment based on listening for the remote socket getting closed.  As for where the
socket comes from-- we just don't put the socket we used to get the FDs in the first place
back into the peer cache.

> BlockReaderLocal must allow zero-copy  reads only when the DN believes it's valid
> ---------------------------------------------------------------------------------
>                 Key: HDFS-5182
>                 URL: https://issues.apache.org/jira/browse/HDFS-5182
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: hdfs-client
>    Affects Versions: 3.0.0
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
> BlockReaderLocal must allow zero-copy reads only when the DN believes it's valid.  This
implies adding a new field to the response to REQUEST_SHORT_CIRCUIT_FDS.  We also need some
kind of heartbeat from the client to the DN, so that the DN can inform the client when the
mapped region is no longer locked into memory.

This message was sent by Atlassian JIRA

View raw message