Return-Path: X-Original-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 7661610660 for ; Sat, 14 Dec 2013 12:12:16 +0000 (UTC) Received: (qmail 4850 invoked by uid 500); 14 Dec 2013 12:12:16 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 4556 invoked by uid 500); 14 Dec 2013 12:12:10 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 4544 invoked by uid 99); 14 Dec 2013 12:12:07 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 14 Dec 2013 12:12:07 +0000 Date: Sat, 14 Dec 2013 12:12:07 +0000 (UTC) From: "Liang Xie (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HDFS-5664) try to relieve the BlockReaderLocal read() synchronized hotspot MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HDFS-5664?page=3Dcom.atlassian.= jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D13848= 340#comment-13848340 ]=20 Liang Xie commented on HDFS-5664: --------------------------------- bq. since there would still be a big "synchronized" on all the DFSInputStre= am#read methods which use the BlockReader This can be fixed by HDFS-1605, e.g. use a read lock for read() bq. If multiple threads want to read the same file at the same time, they c= an open multiple distinct streams for it. At that point, they're not sharin= g the same BlockReader, so whether or not BRL is synchronized doesn't matte= r. yes, this is a feasible idea.=20 But in current HBase codebase, we use only one stream(or two streams consid= ering checksum or not in old version) for one HFile.So seems here is a crit= ical performance issue. we should try to figure out is it possible to remov= e the synchronized keyword in BlockReader or we must consider to use multip= le thread pattern. [~stack], do you familiar with here: why HBase use one s= tream always for one HFile in history=EF=BC=9F I'll try to understand some background here as well. > try to relieve the BlockReaderLocal read() synchronized hotspot > --------------------------------------------------------------- > > Key: HDFS-5664 > URL: https://issues.apache.org/jira/browse/HDFS-5664 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client > Affects Versions: 3.0.0, 2.2.0 > Reporter: Liang Xie > Assignee: Liang Xie > > Current the BlockReaderLocal's read has a synchronized modifier: > {code} > public synchronized int read(byte[] buf, int off, int len) throws IOExcep= tion { > {code} > In a HBase physical read heavy cluster, we observed some hotspots from df= sclient path, the detail strace trace could be found from: https://issues.a= pache.org/jira/browse/HDFS-1605?focusedCommentId=3D13843241&page=3Dcom.atla= ssian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13843241 > I haven't looked into the detail yet, put some raw ideas here firstly: > 1) replace synchronized with try lock with timeout pattern, so could fail= -fast, 2) fallback to non-ssr mode if get a local reader lock failed. > There're two suitable scenario at least to remove this hotspot: > 1) Local physical read heavy, e.g. HBase block cache miss ratio is high > 2) slow/bad disk. > It would be helpful to achive a lower 99th percentile HBase read latency = somehow. -- This message was sent by Atlassian JIRA (v6.1.4#6159)