Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hdfs-issues@hadoop.apache.org
Date: Fri, 25 Apr 2014 20:08:23 +0000 (UTC)
From: "stack (JIRA)" <jira@apache.org>
To: hdfs-issues@hadoop.apache.org
Message-ID: <JIRA.12701861.1395051935152.183649.1398456503612@arcas>
In-Reply-To: <JIRA.12701861.1395051935152@arcas>
References: <JIRA.12701861.1395051935152@arcas>
Subject: [jira] [Updated] (HDFS-6110) adding more slow action log in
 critical write path
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


     [ https://issues.apache.org/jira/browse/HDFS-6110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HDFS-6110:
------------------------

    Attachment: HDFS-6110v6.txt

[~xieliang007] 's latest patch adding in offline review feedback I got from our Todd (See below): i.e. having one threshold for dfsclient (a higher one so folks MR'ing don't get annoyed by all the WARNings about slow i/o), and then another for datanode side which is much lower so we can see bad i/os.

{code}
16:38 < todd> stack: just looked at 6110. had one more thought after commenting on the JIRA
16:38 < todd> you think we should add a separate config for client vs server?
16:38 < todd> I'm afraid that the 300ms default may be a little aggressive for the client - people using hadoop fs -put to upload files may get kind of nervous the next time they upgrade if they start
              seeing warnings
16:38 < todd> MR jobs too
16:39 < todd> may be better to have the client default be 10sec or something really long, and then HBase could tune it down for WAL files
16:39 < stack> todd: thanks boss
16:39 < todd> you think i'm crazy?
16:39 < stack> no
16:39 < stack> Testing it, it is "illuminating" to see how long stuff takes
16:39 < todd> k. yea
16:39 < todd> I had a patch like that once on the server side
16:39 < stack> Was worried though that it'd freak folks out.
16:40 < stack> Or, rather, they'd ignore what is being said and just consider it 'noise'.
16:40 < todd> yea
16:40 < todd> for a throughput app it is kind of noise
16:40 < todd> but hbase could definitely tune the default inside the RS down
16:40 < stack> Let me do as you suggest.
16:40 < todd> k
16:40 < stack> Thanks for review.
16:40 < todd> feel free to paste this convo into the jira so it makes sense :)
16:40 < todd> didn't want to post yet another comment and pollute everyone's mailboxes
16:41  * stack nod
{code}

> adding more slow action log in critical write path
> --------------------------------------------------
>
>                 Key: HDFS-6110
>                 URL: https://issues.apache.org/jira/browse/HDFS-6110
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: datanode
>    Affects Versions: 3.0.0, 2.3.0
>            Reporter: Liang Xie
>            Assignee: Liang Xie
>         Attachments: HDFS-6110-v2.txt, HDFS-6110.txt, HDFS-6110v3.txt, HDFS-6110v4.txt, HDFS-6110v5.txt, HDFS-6110v6.txt
>
>
> After digging a HBase write spike issue caused by slow buffer io in our cluster, just realize we'd better to add more abnormal latency warning log in write flow, such that if other guys hit HLog sync spike, we could know more detail info from HDFS side at the same time.
> Patch will be uploaded soon.


--
This message was sent by Atlassian JIRA
(v6.2#6252)