Return-Path: X-Original-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 7CFCF10874 for ; Wed, 31 Jul 2013 17:57:52 +0000 (UTC) Received: (qmail 6478 invoked by uid 500); 31 Jul 2013 17:57:51 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 6383 invoked by uid 500); 31 Jul 2013 17:57:51 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 6355 invoked by uid 99); 31 Jul 2013 17:57:51 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 31 Jul 2013 17:57:51 +0000 Date: Wed, 31 Jul 2013 17:57:51 +0000 (UTC) From: "Colin Patrick McCabe (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HDFS-4953) enable HDFS local reads via mmap MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HDFS-4953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13725502#comment-13725502 ] Colin Patrick McCabe commented on HDFS-4953: -------------------------------------------- Just to comment about Phantom references specifically: I do believe that PR are better than finalizers. But they still suffer from the same problem of keeping the object around for longer than it needs to be. In this case, that translates into keeping the mmap around for longer than it needs to be. That consumes page table entries and may prevent us from unmapping a memory map which really has not been used for a long time. This in turn may lead to us not creating another mmap that we should have created, because the cache is full. etc. My understanding of PR (correct me if I'm wrong) is that you generally have to have a thread that keeps polling the PR to see if it's ready to be disposed of. This is extra overhead for the users who do remember to correctly call {{close()}}. > enable HDFS local reads via mmap > -------------------------------- > > Key: HDFS-4953 > URL: https://issues.apache.org/jira/browse/HDFS-4953 > Project: Hadoop HDFS > Issue Type: New Feature > Affects Versions: 2.3.0 > Reporter: Colin Patrick McCabe > Assignee: Colin Patrick McCabe > Attachments: benchmark.png, HDFS-4953.001.patch, HDFS-4953.002.patch, HDFS-4953.003.patch, HDFS-4953.004.patch, HDFS-4953.005.patch, HDFS-4953.006.patch > > > Currently, the short-circuit local read pathway allows HDFS clients to access files directly without going through the DataNode. However, all of these reads involve a copy at the operating system level, since they rely on the read() / pread() / etc family of kernel interfaces. > We would like to enable HDFS to read local files via mmap. This would enable truly zero-copy reads. > In the initial implementation, zero-copy reads will only be performed when checksums were disabled. Later, we can use the DataNode's cache awareness to only perform zero-copy reads when we know that checksum has already been verified. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira