hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Colin Patrick McCabe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-5182) BlockReaderLocal must allow zero-copy reads only when the DN believes it's valid
Date Tue, 03 Dec 2013 20:42:36 GMT

    [ https://issues.apache.org/jira/browse/HDFS-5182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13838159#comment-13838159

Colin Patrick McCabe commented on HDFS-5182:

So, previously we discussed a few different ways for the {{DataNode}} to notify the {{DFSClient}}
about a change in the block's mlock status.

One way (let's call this choice #1) was using a shared memory segment.  This would take the
form of a third file descriptor passed from the {{DataNode}} to the {{DFSClient}}.  On Linux,
this would simply be a 4kb file from the {{/dev/shm}} filesystem, which is a {{tmpfs}} filesystem.
 That filesystem is the best choice because it will not cause the file to be written to memory
every {{dirty_centisecs}}.

However, on looking into this further, I found some issues with this method.  There is no
way for the {{DataNode}} to know when the {{DFSClient}} has closed the file descriptor for
the shared memory area.  We could add some kind of protocol for keeping the area alive by
writing to an agreed-upon location, but that would add a fair amount of complexity, and might
be triggered accidentally in the case of a garbage collection event on the {{DFSClient}} or
Another issue is that there is no way for the {{DataNode}} to revoke access to this shared
memory segment.  If the {{DFSClient}} wants to hold on to it forever, leaking memory, it can
do that.  This opens a hole.  The client might not have UNIX permissions to grab space in
{{/dev/shm}}, but through this mechanism it can consume an arbitrary amount of space there.

The other way (let's call this choice #2) is for the client to keep open the Domain socket
it used to request the two file descriptors.  If we can listen for messages sent on this socket,
we can have a truly edge-triggered notification method.  The messages can be as short as a
single byte, since we have very simple message needs.  This requires adding an epoll loop
to handle these notifications without consuming a whole thread per socket.

Regardless of whether we go with choice #1 or #2, there are some other things that need to
be done.

* Right now, we don't allow {{BlockReaderLocal}} instances to share file descriptors with
each other.  However, this would be advisable, to avoid creating 100 pipes/shm areas when
someone re-opens the same file 100 times.  Doing this is actually an easy change (I wrote
and tested the patch already).

* We need to revise {{FileInputStreamCache}} to store the communication method (pipe or shared
memory area) which will be giving us notifications.  This cache also needs to get support
for dealing with mmap regions, and for BRL instances sharing FDs / mmaps.  I have a patch
which reworks this cache, but it's not quite done yet.

* {{BlockReaderLocal}} needs to get support for switching back and forth between honoring
checksums and not.  I have a patch which substantially reworks BRL to add this capability,
which I'm considering posting as a separate JIRA.

> BlockReaderLocal must allow zero-copy  reads only when the DN believes it's valid
> ---------------------------------------------------------------------------------
>                 Key: HDFS-5182
>                 URL: https://issues.apache.org/jira/browse/HDFS-5182
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: hdfs-client
>    Affects Versions: 3.0.0
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
> BlockReaderLocal must allow zero-copy reads only when the DN believes it's valid.  This
implies adding a new field to the response to REQUEST_SHORT_CIRCUIT_FDS.  We also need some
kind of heartbeat from the client to the DN, so that the DN can inform the client when the
mapped region is no longer locked into memory.

This message was sent by Atlassian JIRA

View raw message