Return-Path: X-Original-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 2D17EFBB2 for ; Fri, 25 Apr 2014 20:08:33 +0000 (UTC) Received: (qmail 16479 invoked by uid 500); 25 Apr 2014 20:08:28 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 16371 invoked by uid 500); 25 Apr 2014 20:08:24 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 16327 invoked by uid 99); 25 Apr 2014 20:08:23 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 25 Apr 2014 20:08:23 +0000 Date: Fri, 25 Apr 2014 20:08:23 +0000 (UTC) From: "stack (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (HDFS-6110) adding more slow action log in critical write path MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HDFS-6110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HDFS-6110: ------------------------ Attachment: HDFS-6110v6.txt [~xieliang007] 's latest patch adding in offline review feedback I got from our Todd (See below): i.e. having one threshold for dfsclient (a higher one so folks MR'ing don't get annoyed by all the WARNings about slow i/o), and then another for datanode side which is much lower so we can see bad i/os. {code} 16:38 < todd> stack: just looked at 6110. had one more thought after commenting on the JIRA 16:38 < todd> you think we should add a separate config for client vs server? 16:38 < todd> I'm afraid that the 300ms default may be a little aggressive for the client - people using hadoop fs -put to upload files may get kind of nervous the next time they upgrade if they start seeing warnings 16:38 < todd> MR jobs too 16:39 < todd> may be better to have the client default be 10sec or something really long, and then HBase could tune it down for WAL files 16:39 < stack> todd: thanks boss 16:39 < todd> you think i'm crazy? 16:39 < stack> no 16:39 < stack> Testing it, it is "illuminating" to see how long stuff takes 16:39 < todd> k. yea 16:39 < todd> I had a patch like that once on the server side 16:39 < stack> Was worried though that it'd freak folks out. 16:40 < stack> Or, rather, they'd ignore what is being said and just consider it 'noise'. 16:40 < todd> yea 16:40 < todd> for a throughput app it is kind of noise 16:40 < todd> but hbase could definitely tune the default inside the RS down 16:40 < stack> Let me do as you suggest. 16:40 < todd> k 16:40 < stack> Thanks for review. 16:40 < todd> feel free to paste this convo into the jira so it makes sense :) 16:40 < todd> didn't want to post yet another comment and pollute everyone's mailboxes 16:41 * stack nod {code} > adding more slow action log in critical write path > -------------------------------------------------- > > Key: HDFS-6110 > URL: https://issues.apache.org/jira/browse/HDFS-6110 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode > Affects Versions: 3.0.0, 2.3.0 > Reporter: Liang Xie > Assignee: Liang Xie > Attachments: HDFS-6110-v2.txt, HDFS-6110.txt, HDFS-6110v3.txt, HDFS-6110v4.txt, HDFS-6110v5.txt, HDFS-6110v6.txt > > > After digging a HBase write spike issue caused by slow buffer io in our cluster, just realize we'd better to add more abnormal latency warning log in write flow, such that if other guys hit HLog sync spike, we could know more detail info from HDFS side at the same time. > Patch will be uploaded soon. -- This message was sent by Atlassian JIRA (v6.2#6252)