Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 16D41182E3 for ; Wed, 2 Sep 2015 19:11:31 +0000 (UTC) Received: (qmail 12650 invoked by uid 500); 2 Sep 2015 19:11:25 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 12546 invoked by uid 500); 2 Sep 2015 19:11:25 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 12529 invoked by uid 99); 2 Sep 2015 19:11:25 -0000 Received: from Unknown (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 02 Sep 2015 19:11:25 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id D286BF0ED6 for ; Wed, 2 Sep 2015 19:11:24 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 3 X-Spam-Level: *** X-Spam-Status: No, score=3 tagged_above=-999 required=6.31 tests=[HTML_MESSAGE=3, RCVD_IN_MSPIKE_H2=-0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001, WEIRD_PORT=0.001] autolearn=disabled Received: from mx1-eu-west.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id wUwQXy-gvrQW for ; Wed, 2 Sep 2015 19:11:12 +0000 (UTC) Received: from nk11p18im-asmtp002.me.com (nk11p18im-asmtp002.me.com [17.158.120.161]) by mx1-eu-west.apache.org (ASF Mail Server at mx1-eu-west.apache.org) with ESMTPS id 47B0B20DD8 for ; Wed, 2 Sep 2015 19:11:12 +0000 (UTC) Received: from [192.168.0.5] (ua-83-227-12-104.cust.bredbandsbolaget.se [83.227.12.104]) by nk11p18im-asmtp002.me.com (Oracle Communications Messaging Server 7.0.5.35.0 64bit (built Mar 31 2015)) with ESMTPSA id <0NU2003TEDAJC710@nk11p18im-asmtp002.me.com> for user@hadoop.apache.org; Wed, 02 Sep 2015 19:11:10 +0000 (GMT) X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2015-09-02_08:,, signatures=0 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 suspectscore=1 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1412110000 definitions=main-1509020293 From: Akmal Abbasov Content-type: multipart/alternative; boundary="Apple-Mail=_E75F5258-5A06-4952-A6AD-9CA5E8BBA6E8" Message-id: MIME-version: 1.0 (Mac OS X Mail 8.2 \(2104\)) Subject: Re: High iowait in idle hbase cluster Date: Wed, 02 Sep 2015 21:11:05 +0200 References: <7AFF456D-B058-497F-B378-D7DA20B93263@icloud.com> To: user@hadoop.apache.org In-reply-to: X-Mailer: Apple Mail (2.2104) --Apple-Mail=_E75F5258-5A06-4952-A6AD-9CA5E8BBA6E8 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 Hi Ted, I=E2=80=99ve checked the time when addresses were changed, and this = strange behaviour started weeks before it. yes, 10.10.8.55 is region server and 10.10.8.54 is a hbase master. any thoughts? Thanks > On 02 Sep 2015, at 18:45, Ted Yu wrote: >=20 > bq. change the ip addresses of the cluster nodes >=20 > Did this happen recently ? If high iowait was observed after the = change (you can look at ganglia graph), there is a chance that the = change was related. >=20 > BTW I assume 10.10.8.55 is where your = region server resides. >=20 > Cheers >=20 > On Wed, Sep 2, 2015 at 9:39 AM, Akmal Abbasov = > wrote: > Hi Ted, > sorry forget to mention >=20 >> release of hbase / hadoop you're using >=20 > hbase hbase-0.98.7-hadoop2, hadoop hadoop-2.5.1 >=20 >> were region servers doing compaction ? >=20 > I=E2=80=99ve run major compactions manually earlier today, but it = seems that they already completed, looking at the compactionQueueSize. >=20 >> have you checked region server logs ? > The logs of datanode is full of this kind of messages > 2015-09-02 16:37:06,950 INFO = org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: = /10.10.8.55:50010 , dest: /10.10.8.54:32959 = , bytes: 19673, op: HDFS_READ, cliID: = DFSClient_NONMAPREDUCE_1225374853_1, offset: 0, srvID: = ee7d0634-89a3-4ada-a8ad-7848217327be, blockid: = BP-329084760-10.32.0.180-1387281790961:blk_1075277914_1540222, duration: = 7881815 >=20 > p.s. we had to change the ip addresses of the cluster nodes, is it = relevant? >=20 > Thanks. >=20 >> On 02 Sep 2015, at 18:20, Ted Yu > wrote: >>=20 >> Please provide some more information: >>=20 >> release of hbase / hadoop you're using >> were region servers doing compaction ? >> have you checked region server logs ? >>=20 >> Thanks >>=20 >> On Wed, Sep 2, 2015 at 9:11 AM, Akmal Abbasov = > wrote: >> Hi, >> I=E2=80=99m having strange behaviour in hbase cluster. It is almost = idle, only <5 puts and gets. >> But the data in hdfs is increasing, and region servers have very high = iowait(>100, in 2 core CPU). >> iotop shows that datanode process is reading and writing all the = time. >> Any suggestions? >>=20 >> Thanks. >>=20 >=20 >=20 --Apple-Mail=_E75F5258-5A06-4952-A6AD-9CA5E8BBA6E8 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=utf-8
Hi Ted,
I=E2=80=99ve = checked the time when addresses were changed, and this strange behaviour = started weeks before it.

yes, 10.10.8.55 is region server and 10.10.8.54 is a hbase = master.
any thoughts?

Thanks

On = 02 Sep 2015, at 18:45, Ted Yu <yuzhihong@gmail.com> wrote:

bq. change the ip addresses of the cluster nodes

Did this happen = recently ? If high iowait was observed after the change (you can look at = ganglia graph), there is a chance that the change was = related.

BTW I = assume 10.10.8.55 is = where your region server resides.

Cheers

On Wed, Sep 2, 2015 at 9:39 AM, = Akmal Abbasov <akmal.abbasov@icloud.com> wrote:
Hi Ted,
sorry = forget to mention

release of hbase / hadoop you're = using
hbase hbase-0.98.7-hadoop2, = hadoop hadoop-2.5.1

were region servers doing = compaction ?
I=E2=80=99ve run major = compactions manually earlier today, but it seems that they already = completed, looking at the compactionQueueSize.

have you checked region server logs = ?
The logs of datanode is full of this = kind of messages
2015-09-02 16:37:06,950 INFO = org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /10.10.8.55:50010, dest: /10.10.8.54:32959, bytes: 19673, op: HDFS_READ, cliID: = DFSClient_NONMAPREDUCE_1225374853_1, offset: 0, srvID: = ee7d0634-89a3-4ada-a8ad-7848217327be, blockid: = BP-329084760-10.32.0.180-1387281790961:blk_1075277914_1540222, duration: = 7881815

p.s. = we had to change the ip addresses of the cluster nodes, is it = relevant?

Thanks.

On 02 = Sep 2015, at 18:20, Ted Yu <yuzhihong@gmail.com> wrote:

Please provide = some more information:

release of hbase / hadoop you're using
were region servers doing compaction ?
have you checked region server logs ?

Thanks

On Wed, = Sep 2, 2015 at 9:11 AM, Akmal Abbasov <akmal.abbasov@icloud.com> wrote:
Hi,
I=E2=80=99m having strange behaviour in hbase cluster. It is almost = idle, only <5 puts and gets.
But the data in hdfs is increasing, and region servers have very high = iowait(>100, in 2 core CPU).
iotop shows that datanode process is reading and writing all the = time.
Any suggestions?

Thanks.




= --Apple-Mail=_E75F5258-5A06-4952-A6AD-9CA5E8BBA6E8--