Return-Path: X-Original-To: apmail-hadoop-common-user-archive@www.apache.org Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D344B104BF for ; Tue, 29 Oct 2013 05:19:45 +0000 (UTC) Received: (qmail 72592 invoked by uid 500); 29 Oct 2013 05:19:27 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 70259 invoked by uid 500); 29 Oct 2013 05:19:21 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 70244 invoked by uid 99); 29 Oct 2013 05:19:20 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 29 Oct 2013 05:19:20 +0000 X-ASF-Spam-Status: No, hits=1.8 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,HTML_MESSAGE,NORMAL_HTTP_TO_IP,RCVD_IN_DNSWL_LOW,SPF_PASS,WEIRD_PORT X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of vkjk89@gmail.com designates 209.85.223.169 as permitted sender) Received: from [209.85.223.169] (HELO mail-ie0-f169.google.com) (209.85.223.169) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 29 Oct 2013 05:19:14 +0000 Received: by mail-ie0-f169.google.com with SMTP id ar20so13432752iec.0 for ; Mon, 28 Oct 2013 22:18:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=4CDexJYgKEyYcB/nFinBGMZ9UiWFg7jeTGN1KlYLiqw=; b=DO34RcSf8ekooeqqXNcFkNWAO9NPxkvyWiJNF5gAX4es0KXEKW44TVy9uaq/oAEv8r Sp6QAZpsYhLcXXGg4W2zH901xrD8KYkMTj3xz23hGJ/v1rUMVxNGoSFS4vsa1jVKDNT7 kj6VxnC/mpREhhjn0+R/UDloAtiZvgyq9p90e+8FjfN647jSDSMxTQZczGFtpPtNiueh x98frZHEQXI2rrg6WfcBhFh/habu/ByS+d3EJIYGxPeetu4f29/EwoJ6ERHOKtd6MRrk 6yTbEoU2n/utFH/ffO5//2p8yKU0J3Ctaf+lZYG9cS/H8+/mF6DqtLiizAI6Y5cqMnzr L3dQ== MIME-Version: 1.0 X-Received: by 10.50.103.6 with SMTP id fs6mr11304798igb.16.1383023932709; Mon, 28 Oct 2013 22:18:52 -0700 (PDT) Received: by 10.64.9.237 with HTTP; Mon, 28 Oct 2013 22:18:52 -0700 (PDT) In-Reply-To: References: Date: Tue, 29 Oct 2013 10:48:52 +0530 Message-ID: Subject: Re: High Full GC count for Region server From: Vimal Jain To: "user@hbase.apache.org" Cc: "user@hadoop.apache.org" Content-Type: multipart/alternative; boundary=047d7b2e0abb39a6d404e9da5aff X-Virus-Checked: Checked by ClamAV on apache.org --047d7b2e0abb39a6d404e9da5aff Content-Type: text/plain; charset=ISO-8859-1 Hi, Here is my analysis of this problem.Please correct me if i wrong somewhere. I have assigned 2 GB to region server process.I think its sufficient enough to handle around 9GB of data. I have not changed much of the parameters , especially memstore size which is 128 GB for 0.94.7 by default. Also as per my understanding , each col-family has one memstore associated with it.So my memstores are taking 128*3 = 384 MB ( I have 3 column families). So i think i should reduce memstore size to something like 32/64 MB so that data is flushed to disk at higher frequency then current frequency.This will save some memory. Is there any other parameter other then memstore size which affects memory utilization. Also I am getting below exceptions in data node log and region server log every day.Is it due to long GC pauses ? Data node logs :- hadoop-hadoop-datanode-woody.log:2013-10-29 00:12:13,127 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration( 192.168.20.30:5001 0, storageID=DS-1816106352-192.168.20.30-50010-1369314076237, infoPort=50075, ipcPort=50020):Got exception while serving blk_-560908881317618221_58058 to /192.168.20.30: hadoop-hadoop-datanode-woody.log:java.net.SocketTimeoutException: 480000 millis timeout while waiting for channel to be ready for write. ch : java.nio .channels.SocketChannel[connected local=/192.168.20.30:50010 remote=/ 192.168.20.30:39413] hadoop-hadoop-datanode-woody.log:2013-10-29 00:12:13,127 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration( 192.168.20.30:500 10, storageID=DS-1816106352-192.168.20.30-50010-1369314076237, infoPort=50075, ipcPort=50020):DataXceiver hadoop-hadoop-datanode-woody.log:java.net.SocketTimeoutException: 480000 millis timeout while waiting for channel to be ready for write. ch : java.nio .channels.SocketChannel[connected local=/192.168.20.30:50010 remote=/ 192.168.20.30:39413] Region server logs :- hbase-hadoop-regionserver-woody.log:2013-10-29 01:01:16,475 WARN org.apache.hadoop.ipc.HBaseServer: (responseTooSlow): {"processingtimems":15827,"call ":"multi(org.apache.hadoop.hbase.client.MultiAction@2918e464), rpc version=1, client version=29, methodsFingerPrint=-1368823753","client":"192.168.20. 31:50619","starttimems":1382988660645,"queuetimems":0,"class":"HRegionServer","responsesize":0,"method":"multi"} hbase-hadoop-regionserver-woody.log:2013-10-29 06:01:27,459 WARN org.apache.hadoop.ipc.HBaseServer: (operationTooSlow): {"processingtimems":14745,"cli ent":"192.168.20.31:50908 ","timeRange":[0,9223372036854775807],"starttimems":1383006672707,"responsesize":55,"class":"HRegionServer","table":"event_da ta","cacheBlocks":true,"families":{"oinfo":["clubStatus"]},"row":"1752869","queuetimems":1,"method":"get","totalColumns":1,"maxVersions":1} On Mon, Oct 28, 2013 at 11:55 PM, Asaf Mesika wrote: > Check through HDFS UI that your cluster haven't reached maximum disk > capacity > > On Thursday, October 24, 2013, Vimal Jain wrote: > > > Hi Ted/Jean, > > Can you please help here ? > > > > > > On Tue, Oct 22, 2013 at 10:29 PM, Vimal Jain > > > wrote: > > > > > Hi Ted, > > > Yes i checked namenode and datanode logs and i found below exceptions > in > > > both the logs:- > > > > > > Name node :- > > > java.io.IOException: File > > > > > > /hbase/event_data/433b61f2a4ebff8f2e4b89890508a3b7/.tmp/99797a61a8f7471cb6df8f7b95f18e9e > > > could only be replicated to 0 nodes, instead of 1 > > > > > > java.io.IOException: Got blockReceived message from unregistered or > dead > > > node blk_-2949905629769882833_52274 > > > > > > Data node :- > > > 480000 millis timeout while waiting for channel to be ready for write. > ch > > > : java.nio.channels.SocketChannel[connected local=/192.168.20.30:50010 > > > remote=/192.168.20.30:36188] > > > > > > ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: > > > DatanodeRegistration(192.168.20.30:50010, > > > storageID=DS-1816106352-192.168.20.30-50010-1369314076237, > > infoPort=50075, > > > ipcPort=50020):DataXceiver > > > > > > java.io.EOFException: while trying to read 39309 bytes > > > > > > > > > On Tue, Oct 22, 2013 at 10:19 PM, Ted Yu wrote: > > > > > >> bq. java.io.IOException: File /hbase/event_data/ > > >> 4c3765c51911d6c67037a983d205a010/.tmp/bfaf8df33d5b4068825e3664d3e4b2b0 > > >> could > > >> only be replicated to 0 nodes, instead of 1 > > >> > > >> Have you checked Namenode / Datanode logs ? > > >> Looks like hdfs was not stable. > > >> > > >> > > >> On Tue, Oct 22, 2013 at 9:01 AM, Vimal Jain wrote: > > >> > > >> > HI Jean, > > >> > Thanks for your reply. > > >> > I have total 8 GB memory and distribution is as follows:- > > >> > > > >> > Region server - 2 GB > > >> > Master,Namenode,Datanode,Secondary Namenode,Zookepeer - 1 GB > > >> > OS - 1 GB > > >> > > > >> > Please let me know if you need more information. > > >> > > > >> > > > >> > On Tue, Oct 22, 2013 at 8:15 PM, Jean-Marc Spaggiari < > > >> > jean-marc@spaggiari.org> wrote: > > >> > > > >> > > Hi Vimal, > > >> > > > > >> > > What are your settings? Memory of the host, and memory allocated > for > > >> the > > >> > > different HBase services? > > >> > > > > >> > > Thanks, > > >> > > > > >> > > JM > > >> > > > > >> > > > > >> > > 2013/10/22 Vimal Jain > > >> > > > > >> > > > Hi, > > >> > > > I am running in Hbase in pseudo distributed mode. ( Hadoop > > version - > > >> > > 1.1.2 > > >> > > > , Hbase version - 0.94.7 ) > > >> > > > I am getting few exceptions in both hadoop ( namenode , > datanode) > > >> logs > > >> > > and > > >> > > > hbase(region server). > > >> > > > When i search for these exceptions on google , i concluded that > > >> > problem > > >> > > is > > >> > > > mainly due to large number of full GC in region server process. > > >> > > > > > >> > > > I used jstat and found that there are total of 950 full GCs in > > span > > >> of > > >> > 4 > > >> > > > days for region server process.Is this ok? > > >> > > > > > >> > > > I am totally confused by number of exceptions i am getting. > > >> > > > Also i get below exceptions intermittently. > > >> > > > > > >> > > > > > >> > > > Region server:- > > >> > > > > > >> > > > 2013-10-22 12:00:26,627 WARN org.apache.hadoop.ipc.HBaseServer: > > >> > > > (responseTooSlow): > > >> > > > {"processingtimems":15312,"call":"next(-6681408251916104762, > > 1000), > > >> rpc > > >> > > > version=1, client version=29, > > >> > methodsFingerPrint=-1368823753","client":" > > >> > > > 192.168.20.31:48270 > > >> > > > > > >> > > > > > >> > > > > >> > > > >> > > > ","starttimems":1382423411293,"queuetimems":0,"class":"HRegionServer","responsesize":4808556,"method":"next"} > > >> > > > 2013-10-22 12:06:17,606 WARN org.apache.hadoop.ipc.HBaseServer: > > >> > > > (operationTooSlow): {"processingtimems":14759,"client":" > > >> > > > 192.168.20.31:48247 > > >> > > > > > >> > > > > > >> > > > > >> > > > >> > > > ","timeRange":[0,9223372036854775807],"starttimems":1382423762845,"responsesize":61,"class":"HRegionServer","table":"event_data","cacheBlocks":true,"families":{"gin > -- Thanks and Regards, Vimal Jain --047d7b2e0abb39a6d404e9da5aff Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Hi,
Here is my analysis of this problem= .Please correct me if i wrong somewhere.
I have assigned 2 GB= to region server process.I think its sufficient enough to handle around 9G= B of data.
I have not changed much of the parameters , especially memstore = size which is 128 GB for 0.94.7 by default.
Also as per my un= derstanding , each col-family has one memstore associated with it.So my mem= stores are taking 128*3 =3D 384 MB ( I have 3 column families).
So i think i should reduce memstore size to something like 32/64 MB so that= data is flushed to disk at higher frequency then current frequency.This wi= ll save some memory.
Is there any other parameter other then = memstore size which affects memory utilization.

Also I am getting below exceptions in data node log and regi= on server log every day.Is it due to long GC pauses ?

Dat= a node logs :-

hadoop-hadoop-datanode-woody.log:2013-10-29 00:12:13= ,127 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistra= tion(192.168.20.30:5001
0, storageID=3DDS-1816106352-192.168.20.30-50010-1369314076237, infoPort=3D= 50075, ipcPort=3D50020):Got exception while serving blk_-560908881317618221= _58058
=A0to /192.168.20.30:
had= oop-hadoop-datanode-woody.log:java.net.SocketTimeoutException: 480000 milli= s timeout while waiting for channel to be ready for write. ch : java.nio .channels.SocketChannel[connected local=3D/192.168.20.30:50010 remote=3D/192.168.20.30:39413]
hadoop-hadoop-datanode-woody.log:2013-10-2= 9 00:12:13,127 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: Datan= odeRegistration(192.168.20.30:500<= br> 10, storageID=3DDS-1816106352-192.168.20.30-50010-1369314076237, infoPort= =3D50075, ipcPort=3D50020):DataXceiver
hadoop-hadoop-datanode-woody.log:= java.net.SocketTimeoutException: 480000 millis timeout while waiting for ch= annel to be ready for write. ch : java.nio
.channels.SocketChannel[connected local=3D/192.168.20.30:50010 remote=3D/192.168.20.30:39413]


Region server logs :-
hbase-hadoop-regionserver-woody.log:2013-10-29 01:01:16,475 WARN org.apache= .hadoop.ipc.HBaseServer: (responseTooSlow): {"processingtimems":1= 5827,"call
":"multi(org.apache.hadoop.hbase.client.MultiA= ction@2918e464), rpc version=3D1, client version=3D29, methodsFingerPrint= =3D-1368823753","client":"192.168.20.
31:50619","starttimems":1382988660645,"queuetimems"= ;:0,"class":"HRegionServer","responsesize":0,= "method":"multi"}
hbase-hadoop-regionserver-woody.lo= g:2013-10-29 06:01:27,459 WARN org.apache.hadoop.ipc.HBaseServer: (operatio= nTooSlow): {"processingtimems":14745,"cli
ent":"192.168.20.31:50908<= /a>","timeRange":[0,9223372036854775807],"starttimems&q= uot;:1383006672707,"responsesize":55,"class":"HReg= ionServer","table":"event_da
ta","cacheBlocks":true,"families":{"oinfo&quo= t;:["clubStatus"]},"row":"1752869","queu= etimems":1,"method":"get","totalColumns"= :1,"maxVersions":1}





On Mon, Oct 28, 2013 at 11:55 PM, Asaf Mesika <asaf.mesika@gmail.= com> wrote:
Check through HDFS UI tha= t your cluster haven't reached maximum disk
capacity

On Thursday, October 24, 2013, Vimal Jain wrote:

> Hi Ted/Jean,
> Can you please help here ?
>
>
> On Tue, Oct 22, 2013 at 10:29 PM, Vimal Jain <vkjk89@gmail.com<javascript:;= >>
> wrote:
>
> > Hi Ted,
> > Yes i checked namenode and datanode logs and i found below except= ions in
> > both the logs:-
> >
> > Name node :-
> > java.io.IOException: File
> >
> /hbase/event_data/433b61f2a4ebff8f2e4b89890508a3b7/.tmp/99797a61a8f747= 1cb6df8f7b95f18e9e
> > could only be replicated to 0 nodes, instead of 1
> >
> > java.io.IOException: Got blockReceived message from unregistered = or dead
> > node blk_-2949905629769882833_52274
> >
> > Data node :-
> > 480000 millis timeout while waiting for channel to be ready for w= rite. ch
> > : java.nio.channels.SocketChannel[connected local=3D/192.168.20.30:50010
> > =A0remote=3D/192.168.20.30:36188]
> >
> > ERROR org.apache.hadoop.hdfs.server.datanode.DataNode:
> > DatanodeRegistration(192.168.20.30:50010,
> > storageID=3DDS-1816106352-192.168.20.30-50010-1369314076237,
> infoPort=3D50075,
> > ipcPort=3D50020):DataXceiver
> >
> > java.io.EOFException: while trying to read 39309 bytes
> >
> >
> > On Tue, Oct 22, 2013 at 10:19 PM, Ted Yu <yuzhihong@gmail.com> wrote:
> >
> >> bq. java.io.IOException: File /hbase/event_data/
> >> 4c3765c51911d6c67037a983d205a010/.tmp/bfaf8df33d5b4068825e366= 4d3e4b2b0
> >> could
> >> only be replicated to 0 nodes, instead of 1
> >>
> >> Have you checked Namenode / Datanode logs ?
> >> Looks like hdfs was not stable.
> >>
> >>
> >> On Tue, Oct 22, 2013 at 9:01 AM, Vimal Jain <vkjk89@gmail.com> wrote: > >>
> >> > HI Jean,
> >> > Thanks for your reply.
> >> > I have total 8 GB memory and distribution is as follows:= -
> >> >
> >> > Region server =A0- 2 GB
> >> > Master,Namenode,Datanode,Secondary Namenode,Zookepeer - = 1 GB
> >> > OS - 1 GB
> >> >
> >> > Please let me know if you need more information.
> >> >
> >> >
> >> > On Tue, Oct 22, 2013 at 8:15 PM, Jean-Marc Spaggiari <= ;
> >> > jean-marc@spaggiari.org> wrote:
> >> >
> >> > > Hi Vimal,
> >> > >
> >> > > What are your settings? Memory of the host, and mem= ory allocated for
> >> the
> >> > > different HBase services?
> >> > >
> >> > > Thanks,
> >> > >
> >> > > JM
> >> > >
> >> > >
> >> > > 2013/10/22 Vimal Jain <vkjk89@gmail.com>
> >> > >
> >> > > > Hi,
> >> > > > I am running in Hbase in pseudo distributed mo= de. ( Hadoop
> version -
> >> > > 1.1.2
> >> > > > , Hbase version - 0.94.7 )
> >> > > > I am getting few exceptions in both hadoop ( n= amenode , datanode)
> >> logs
> >> > > and
> >> > > > hbase(region server).
> >> > > > When i search for these exceptions on google ,= i concluded =A0that
> >> > problem
> >> > > is
> >> > > > mainly due to large number of full GC in regio= n server process.
> >> > > >
> >> > > > I used jstat and found that there are total of= 950 full GCs in
> span
> >> of
> >> > 4
> >> > > > days for region server process.Is this ok?
> >> > > >
> >> > > > I am totally confused by number of exceptions = i am getting.
> >> > > > Also i get below exceptions intermittently. > >> > > >
> >> > > >
> >> > > > Region server:-
> >> > > >
> >> > > > 2013-10-22 12:00:26,627 WARN org.apache.hadoop= .ipc.HBaseServer:
> >> > > > (responseTooSlow):
> >> > > > {"processingtimems":15312,"call= ":"next(-6681408251916104762,
> 1000),
> >> rpc
> >> > > > version=3D1, client version=3D29,
> >> > methodsFingerPrint=3D-1368823753","client"= ;:"
> >> > > > 192.168.20.31:48270
> >> > > >
> >> > > >
> >> > >
> >> >
> >>
> ","starttimems":1382423411293,"queuetimems":0= ,"class":"HRegionServer","responsesize":48085= 56,"method":"next"}
> >> > > > 2013-10-22 12:06:17,606 WARN org.apache.hadoop= .ipc.HBaseServer:
> >> > > > (operationTooSlow): {"processingtimems&qu= ot;:14759,"client":"
> >> > > > 192.168.20.31:48247
> >> > > >
> >> > > >
> >> > >
> >> >
> >>
> ","timeRange":[0,9223372036854775807],"starttimems= ":1382423762845,"responsesize":61,"class":"HR= egionServer","table":"event_data","cacheBlock= s":true,"families":{"gin



--
Thanks and Regards,
Vimal Jain<= /div>
--047d7b2e0abb39a6d404e9da5aff--