Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hdfs-issues@hadoop.apache.org
Date: Thu, 9 Jan 2014 21:43:50 +0000 (UTC)
From: "Colin Patrick McCabe (JIRA)" <jira@apache.org>
To: hdfs-issues@hadoop.apache.org
Message-ID: <JIRA.12667954.1378857814548.71651.1389303830947@arcas>
In-Reply-To: <JIRA.12667954.1378857814548@arcas>
References: <JIRA.12667954.1378857814548@arcas>
Subject: [jira] [Commented] (HDFS-5182) BlockReaderLocal must allow
 zero-copy  reads only when the DN believes it's valid
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable


    [ https://issues.apache.org/jira/browse/HDFS-5182?page=3Dcom.atlassian.=
jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D13867=
110#comment-13867110 ]=20

Colin Patrick McCabe commented on HDFS-5182:
--------------------------------------------

bq. That seems much longer than necessary =E2=80=93 don't we want clients t=
o be able to keep mmaps around in their cache for very long periods of time=
? And then, when the user requests the read, we can "anchor" the mmap only =
for the duration of time for which the user holds onto the zero-copy buffer=
? Once the user returns the zero-copy buffer, we can decrement the count an=
d allow the DN to evict the block from the cache.

Sorry, I was unclear.  When I said "closed" I mean that the user had return=
ed the zero-copy buffer.  So the same thing you suggested.

bq. I disagree on this. Just because you want to skip checksumming doesn't =
mean you can tolerate SIGBUS. For example, many file formats have their own=
 checksums, so we can safely skip HDFS checksumming, but we still want to e=
nsure that we're only reading locked (i.e safe) memory via mmap.

What I was referring to here is where a client has specifically requested a=
n mmap region using the zero-copy API and the SKIP_CHECKSUMS option.  In th=
at case, the user is clearly going to be reading without any guarantees fro=
m us.  If the user just uses the normal (non-zero-copy, non-mmap) read path=
, SIGBUS will not be an issue.

(There have been some proposals to improve the SIGBUS situation for zero-co=
py reads without mlock, but they're certainly out of scope for this JIRA.)

bq. Maybe this can be put into a separate JIRA, and first implement just a =
very simple timeout-based mechanism? The DN could change the anchor flag to=
 a magic value which invalidates the segment and then close it after some a=
mount of time. Then if the client looks at it again it will know to invalid=
ate.

Timeouts and two-way protocols get complex.  I already have the code for cl=
osing the shared memory segment based on listening for the remote socket ge=
tting closed.  As for where the socket comes from-- we just don't put the s=
ocket we used to get the FDs in the first place back into the peer cache.

> BlockReaderLocal must allow zero-copy  reads only when the DN believes it=
's valid
> -------------------------------------------------------------------------=
--------
>
>                 Key: HDFS-5182
>                 URL: https://issues.apache.org/jira/browse/HDFS-5182
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: hdfs-client
>    Affects Versions: 3.0.0
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
>
> BlockReaderLocal must allow zero-copy reads only when the DN believes it'=
s valid.  This implies adding a new field to the response to REQUEST_SHORT_=
CIRCUIT_FDS.  We also need some kind of heartbeat from the client to the DN=
, so that the DN can inform the client when the mapped region is no longer =
locked into memory.


--
This message was sent by Atlassian JIRA
(v6.1.5#6160)