hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Colin Patrick McCabe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-5182) BlockReaderLocal must allow zero-copy reads only when the DN believes it's valid
Date Thu, 09 Jan 2014 11:05:56 GMT

    [ https://issues.apache.org/jira/browse/HDFS-5182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13866545#comment-13866545
] 

Colin Patrick McCabe commented on HDFS-5182:
--------------------------------------------

A few notes about the planned implementation here:

The main idea here is to have a shared memory segment which the DFSClient and Datanode can
both read and write.  Before each read, the DFSClient will look at this shared memory segment
to see if it can be "anchored."  A segment will be anchorable if the datanode has mlocked
it.  If the segment can be anchored, the dfsclient will increment the anchor count.  Then,
the client can read without validating the checksum.  When the client is done reading it will
decrement the anchor count.  These are just memory operations, so they will be fast.

Similarly, when the client tries to do a zero-copy read, it will check to see if the segment
is anchorable, and increment the anchor count before performing the mmap.  The anchor count
will stay incremented until the mmap is closed.  One exception is if the client passes the
ReadOption.SKIP_CHECKSUMS flag.  In that case, we do not need to consult the anchor flag because
we are willing to tolerate bad data being returned or SIGBUS.

Shared memory segments will have a fixed size and contain a series of fixed-size slots.  The
client will request a shared memory segment via the REQUEST_SHORT_CIRCUIT_FDS operation. 
Of course, not every REQUEST_SHORT_CIRCUIT_FDS operation needs to get a new shared memory
segment, since each segment can hold multiple slots.  The client caches these segments and
only requests a new one when it needs it.  Segments will be closed when no more slots in them
are in use.

One issue with the shared memory segments discussed here is that when a client terminates,
the datanode receives no notification that the shared memory segment it created is no longer
needed.  For this reason, each shared memory segment will have a domain socket associated
with it.  The only function of this socket is to cause a close notification to be sent to
the datanode when the client closes (or vice versa).  (When a UNIX domain socket closes, the
remote end gets a close notification).  The socket which is used will be the same socket on
which the REQUEST_SHORT_CIRCUIT_FDS that fetched the segment was performed.  We simply don't
put it back into the peer cache.

> BlockReaderLocal must allow zero-copy  reads only when the DN believes it's valid
> ---------------------------------------------------------------------------------
>
>                 Key: HDFS-5182
>                 URL: https://issues.apache.org/jira/browse/HDFS-5182
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: hdfs-client
>    Affects Versions: 3.0.0
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
>
> BlockReaderLocal must allow zero-copy reads only when the DN believes it's valid.  This
implies adding a new field to the response to REQUEST_SHORT_CIRCUIT_FDS.  We also need some
kind of heartbeat from the client to the DN, so that the DN can inform the client when the
mapped region is no longer locked into memory.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Mime
View raw message