Return-Path: X-Original-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 5D717C27B for ; Thu, 20 Nov 2014 06:58:35 +0000 (UTC) Received: (qmail 76439 invoked by uid 500); 20 Nov 2014 06:58:35 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 76383 invoked by uid 500); 20 Nov 2014 06:58:35 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 76372 invoked by uid 99); 20 Nov 2014 06:58:35 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 20 Nov 2014 06:58:35 +0000 Date: Thu, 20 Nov 2014 06:58:35 +0000 (UTC) From: "Lars Hofhansl (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14219072#comment-14219072 ] Lars Hofhansl commented on HDFS-6735: ------------------------------------- Thanks [~cmccabe]. "infoLock" is better. I'll fix the indentation later. Let me have a look at tryReadZeroCopy again. I had mapped out all members and which methods use what, and concluded the synchronized wasn't needed, quite possible I made a mistake. Another locking option is not to synchronize on at all, but to have two locks ("streamLock" and "pLock", or whatever are good names). That way the intend might be more explicit. Yet another option would be to disentangle to two apis by subclassing or delegation (since the issue really is that we have state for two different modes of operation in the same class), that'd be a bigger change though. Meanwhile in HBase land: Tested this with HBase and observed with a sampler that all delays internal to DFSInputStream are gone, which is nice. I committed a change to HBase to allow us to (1) have compaction use their own input streams so they do not interfere with user scans along the same files and (2) optionally force p-reads for all user scans. See HBASE-12411. Especially with #2 I see nice speedups for many concurrent scanners essentially to what my disks can sustain, but a 50% slow downs for a single scanner per file only - which is obvious as we're not benefiting from prefetching now. > A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream > ----------------------------------------------------------------------------------------- > > Key: HDFS-6735 > URL: https://issues.apache.org/jira/browse/HDFS-6735 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client > Affects Versions: 3.0.0 > Reporter: Liang Xie > Assignee: Liang Xie > Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735.txt > > > In current DFSInputStream impl, there're a couple of coarser-grained locks in read/pread path, and it has became a HBase read latency pain point so far. In HDFS-6698, i made a minor patch against the first encourtered lock, around getFileLength, in deed, after reading code and testing, it shows still other locks we could improve. > In this jira, i'll make a patch against other locks, and a simple test case to show the issue and the improved result. > This is important for HBase application, since in current HFile read path, we issue all read()/pread() requests in the same DFSInputStream for one HFile. (Multi streams solution is another story i had a plan to do, but probably will take more time than i expected) -- This message was sent by Atlassian JIRA (v6.3.4#6332)