Return-Path: X-Original-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 3F3D310613 for ; Thu, 9 Jan 2014 21:43:51 +0000 (UTC) Received: (qmail 17850 invoked by uid 500); 9 Jan 2014 21:43:51 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 17814 invoked by uid 500); 9 Jan 2014 21:43:51 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 17806 invoked by uid 99); 9 Jan 2014 21:43:51 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 09 Jan 2014 21:43:51 +0000 Date: Thu, 9 Jan 2014 21:43:50 +0000 (UTC) From: "Colin Patrick McCabe (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HDFS-5182) BlockReaderLocal must allow zero-copy reads only when the DN believes it's valid MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HDFS-5182?page=3Dcom.atlassian.= jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D13867= 110#comment-13867110 ]=20 Colin Patrick McCabe commented on HDFS-5182: -------------------------------------------- bq. That seems much longer than necessary =E2=80=93 don't we want clients t= o be able to keep mmaps around in their cache for very long periods of time= ? And then, when the user requests the read, we can "anchor" the mmap only = for the duration of time for which the user holds onto the zero-copy buffer= ? Once the user returns the zero-copy buffer, we can decrement the count an= d allow the DN to evict the block from the cache. Sorry, I was unclear. When I said "closed" I mean that the user had return= ed the zero-copy buffer. So the same thing you suggested. bq. I disagree on this. Just because you want to skip checksumming doesn't = mean you can tolerate SIGBUS. For example, many file formats have their own= checksums, so we can safely skip HDFS checksumming, but we still want to e= nsure that we're only reading locked (i.e safe) memory via mmap. What I was referring to here is where a client has specifically requested a= n mmap region using the zero-copy API and the SKIP_CHECKSUMS option. In th= at case, the user is clearly going to be reading without any guarantees fro= m us. If the user just uses the normal (non-zero-copy, non-mmap) read path= , SIGBUS will not be an issue. (There have been some proposals to improve the SIGBUS situation for zero-co= py reads without mlock, but they're certainly out of scope for this JIRA.) bq. Maybe this can be put into a separate JIRA, and first implement just a = very simple timeout-based mechanism? The DN could change the anchor flag to= a magic value which invalidates the segment and then close it after some a= mount of time. Then if the client looks at it again it will know to invalid= ate. Timeouts and two-way protocols get complex. I already have the code for cl= osing the shared memory segment based on listening for the remote socket ge= tting closed. As for where the socket comes from-- we just don't put the s= ocket we used to get the FDs in the first place back into the peer cache. > BlockReaderLocal must allow zero-copy reads only when the DN believes it= 's valid > -------------------------------------------------------------------------= -------- > > Key: HDFS-5182 > URL: https://issues.apache.org/jira/browse/HDFS-5182 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client > Affects Versions: 3.0.0 > Reporter: Colin Patrick McCabe > Assignee: Colin Patrick McCabe > > BlockReaderLocal must allow zero-copy reads only when the DN believes it'= s valid. This implies adding a new field to the response to REQUEST_SHORT_= CIRCUIT_FDS. We also need some kind of heartbeat from the client to the DN= , so that the DN can inform the client when the mapped region is no longer = locked into memory. -- This message was sent by Atlassian JIRA (v6.1.5#6160)