Return-Path: X-Original-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 086761845B for ; Wed, 2 Sep 2015 19:57:57 +0000 (UTC) Received: (qmail 6035 invoked by uid 500); 2 Sep 2015 19:57:45 -0000 Delivered-To: apmail-hadoop-hdfs-user-archive@hadoop.apache.org Received: (qmail 5896 invoked by uid 500); 2 Sep 2015 19:57:45 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 5882 invoked by uid 99); 2 Sep 2015 19:57:44 -0000 Received: from Unknown (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 02 Sep 2015 19:57:44 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 77391F0F5B for ; Wed, 2 Sep 2015 19:57:44 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.881 X-Spam-Level: ** X-Spam-Status: No, score=2.881 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=3, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001, URIBL_BLOCKED=0.001, WEIRD_PORT=0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-eu-west.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id UkdCitNTqJmm for ; Wed, 2 Sep 2015 19:57:33 +0000 (UTC) Received: from mail-yk0-f182.google.com (mail-yk0-f182.google.com [209.85.160.182]) by mx1-eu-west.apache.org (ASF Mail Server at mx1-eu-west.apache.org) with ESMTPS id 55E7420DD8 for ; Wed, 2 Sep 2015 19:57:32 +0000 (UTC) Received: by ykek143 with SMTP id k143so22050159yke.2 for ; Wed, 02 Sep 2015 12:57:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=krrvBSffqRHMm/LQe0h7exAYxaitvf8Un0wwpSGrySw=; b=RPb7z9Ne+Z4GJC2gefjKRodkaLynKJOl1y31OYnS+t2y4eZSN8oVSzYqlo1r6TrPJG WTgaiqO3vbS3KKltgo8XS0Zr2G1Y+xcRMsuiBEnux95pIL5rKyJkMV5sSj81/bvpQZuh 6FfzV0jWoq9c+e5//8Him0kkh6vpEjlrBxwxU4PRbDQhMmrVq23RREsbXkmU6hL4OQmd Hv7jX0K0GE/w6Ls+YuqF+7A++aqMwgGENY505wpt6lmDiWlw0XG5xOmq1bYBwmsDDTEm RdjuWr8iZGz02qkBp/H2SPK8HZ/oIrbw1dUD+hYxfhUVNAdo7cHaz15B+p/FcvQpHJAX p1ww== MIME-Version: 1.0 X-Received: by 10.13.206.67 with SMTP id q64mr37493595ywd.154.1441223851400; Wed, 02 Sep 2015 12:57:31 -0700 (PDT) Received: by 10.37.210.197 with HTTP; Wed, 2 Sep 2015 12:57:31 -0700 (PDT) In-Reply-To: References: <7AFF456D-B058-497F-B378-D7DA20B93263@icloud.com> Date: Wed, 2 Sep 2015 12:57:31 -0700 Message-ID: Subject: Re: High iowait in idle hbase cluster From: Ted Yu To: "common-user@hadoop.apache.org" Content-Type: multipart/alternative; boundary=001a114dafacb46949051ec91322 --001a114dafacb46949051ec91322 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable I assume you have enabled short-circuit read. Can you capture region server stack trace(s) and pastebin them ? Thanks On Wed, Sep 2, 2015 at 12:11 PM, Akmal Abbasov wrote: > Hi Ted, > I=E2=80=99ve checked the time when addresses were changed, and this stran= ge > behaviour started weeks before it. > > yes, 10.10.8.55 is region server and 10.10.8.54 is a hbase master. > any thoughts? > > Thanks > > On 02 Sep 2015, at 18:45, Ted Yu wrote: > > bq. change the ip addresses of the cluster nodes > > Did this happen recently ? If high iowait was observed after the change > (you can look at ganglia graph), there is a chance that the change was > related. > > BTW I assume 10.10.8.55 is where your region > server resides. > > Cheers > > On Wed, Sep 2, 2015 at 9:39 AM, Akmal Abbasov > wrote: > >> Hi Ted, >> sorry forget to mention >> >> release of hbase / hadoop you're using >> >> hbase hbase-0.98.7-hadoop2, hadoop hadoop-2.5.1 >> >> were region servers doing compaction ? >> >> I=E2=80=99ve run major compactions manually earlier today, but it seems = that they >> already completed, looking at the compactionQueueSize. >> >> have you checked region server logs ? >> >> The logs of datanode is full of this kind of messages >> 2015-09-02 16:37:06,950 INFO >> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: / >> 10.10.8.55:50010, dest: /10.10.8.54:32959, bytes: 19673, op: HDFS_READ, >> cliID: DFSClient_NONMAPREDUCE_1225374853_1, offset: 0, srvID: >> ee7d0634-89a3-4ada-a8ad-7848217327be, blockid: >> BP-329084760-10.32.0.180-1387281790961:blk_1075277914_1540222, duration: >> 7881815 >> >> p.s. we had to change the ip addresses of the cluster nodes, is it >> relevant? >> >> Thanks. >> >> On 02 Sep 2015, at 18:20, Ted Yu wrote: >> >> Please provide some more information: >> >> release of hbase / hadoop you're using >> were region servers doing compaction ? >> have you checked region server logs ? >> >> Thanks >> >> On Wed, Sep 2, 2015 at 9:11 AM, Akmal Abbasov >> wrote: >> >>> Hi, >>> I=E2=80=99m having strange behaviour in hbase cluster. It is almost idl= e, only >>> <5 puts and gets. >>> But the data in hdfs is increasing, and region servers have very high >>> iowait(>100, in 2 core CPU). >>> iotop shows that datanode process is reading and writing all the time. >>> Any suggestions? >>> >>> Thanks. >> >> >> >> > > --001a114dafacb46949051ec91322 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
I assume you have enabled short-circuit read.

Can you capture region server stack trace(s) and pastebin them ?

Thanks

On Wed, Sep 2, 2015 at 12:11 PM, Akmal Abbasov <akmal.abbasov@icloud.com> wrote:
Hi Ted,
I=E2= =80=99ve checked the time when addresses were changed, and this strange beh= aviour started weeks before it.

yes, 10.10.8.55 is= region server and 10.10.8.54 is a hbase master.
any thoughts?

Thanks

= On 02 Sep 2015, at 18:45, Ted Yu <yuzhihong@gmail.com> wrote:

bq.=C2=A0change the = ip addresses of the cluster nodes

Did this happen recently ? If high iowait was observed after the cha= nge (you can look at ganglia graph), there is a chance that the change was = related.

=
BTW I assume= =C2=A010.10.8.55=C2=A0is where your region se= rver resides.

Cheers<= /div>

On Wed, Sep 2, 2015 at 9:39 AM, Akmal Abbasov <akm= al.abbasov@icloud.com> wrote:
Hi Ted,
sorry forget to mention<= /div>

re= lease of hbase / hadoop you're using
hbase=C2=A0hbase-0.98.7-hadoop2, hadoop=C2=A0hadoop= -2.5.1

were region servers doing compaction ?
<= /span>
I=E2=80=99ve run major compactions manuall= y earlier today, but it seems that they already completed, looking at the c= ompactionQueueSize.

have you checked region server lo= gs ?
The logs of datanode is full of this ki= nd of messages
2015-09-02 16:37:06,950 I= NFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /10.10.8.55:50010, de= st: /10.10.8.54:3295= 9, bytes: 19673, op: HDFS_READ, cliID: DFSClient_NONMAPREDUCE_122537485= 3_1, offset: 0, srvID: ee7d0634-89a3-4ada-a8ad-7848217327be, blockid: BP-32= 9084760-10.32.0.180-1387281790961:blk_1075277914_1540222, duration: 7881815=

p.s. we had to change the ip addresses of the clu= ster nodes, is it relevant?

Thanks.

On 02 Sep 2015, at 18:20, Ted Yu <yuzhihong@gmail.com> wrote:
<= br>
Please provide some more information:

release of hbase / hadoop you're using
were region ser= vers doing compaction ?
have you checked region server logs ?

Thanks

On Wed, Sep 2, 2015 at 9:11 AM, Akmal Abbasov <akmal.abbasov@icloud.com> wrote:
Hi,
I=E2=80=99m having strange behaviour in hbase cluster. It is almost idle, o= nly <5 puts and gets.
But the data in hdfs is increasing, and region servers have very high iowai= t(>100, in 2 core CPU).
iotop shows that datanode process is reading and writing all the time.
Any suggestions?

Thanks.





--001a114dafacb46949051ec91322--