Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 7D94210266 for ; Thu, 24 Oct 2013 05:50:35 +0000 (UTC) Received: (qmail 6795 invoked by uid 500); 24 Oct 2013 05:50:23 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 6197 invoked by uid 500); 24 Oct 2013 05:50:22 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 5093 invoked by uid 99); 24 Oct 2013 05:50:21 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 24 Oct 2013 05:50:21 +0000 X-ASF-Spam-Status: No, hits=1.8 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,HTML_MESSAGE,NORMAL_HTTP_TO_IP,RCVD_IN_DNSWL_LOW,SPF_PASS,WEIRD_PORT X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of vkjk89@gmail.com designates 209.85.223.181 as permitted sender) Received: from [209.85.223.181] (HELO mail-ie0-f181.google.com) (209.85.223.181) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 24 Oct 2013 05:50:15 +0000 Received: by mail-ie0-f181.google.com with SMTP id ar20so3147113iec.12 for ; Wed, 23 Oct 2013 22:49:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=YqV5c1xP9+XXnxjZxUc40D/cR4A936QGnEJ4vSk22C0=; b=Oe+9DF4VqAvRbejYGlo6qh2oRb1qdXuHEG7np4Y8HbZkJG1lITEw7PyugkpdShPszj Hh1Pk4pN3G1C5CYgLFTurBH4jKB26XuwJuNLRaZCJhChCMIacJ/qKddImQFMRGfGRgba EyDDpL/7C6+jMvmnImOIJLlwwwWUwceCBoM5DdPyLAVO6mU4CqTwQx72DkKBOHchQQk+ bj37IhpTYqy8bvtp7o9IZxb7oLqtFC00JxX9/lMQ6oynTLJvCyrJcfvd4jVxeb1jUfo2 Gdh8FNuz2d7ruM5if5ntUQNh5Q10wSY8agUQMCjTRMx5ZxVhKyAdQOTwDaFkRDvqHX+W 1+ng== MIME-Version: 1.0 X-Received: by 10.50.87.74 with SMTP id v10mr438253igz.27.1382593794331; Wed, 23 Oct 2013 22:49:54 -0700 (PDT) Received: by 10.64.9.237 with HTTP; Wed, 23 Oct 2013 22:49:54 -0700 (PDT) In-Reply-To: References: Date: Thu, 24 Oct 2013 11:19:54 +0530 Message-ID: Subject: Re: High Full GC count for Region server From: Vimal Jain To: "user@hbase.apache.org" , user@hadoop.apache.org Content-Type: multipart/alternative; boundary=089e0102f84afadbc904e9763310 X-Virus-Checked: Checked by ClamAV on apache.org --089e0102f84afadbc904e9763310 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Hi Ted/Jean, Can you please help here ? On Tue, Oct 22, 2013 at 10:29 PM, Vimal Jain wrote: > Hi Ted, > Yes i checked namenode and datanode logs and i found below exceptions in > both the logs:- > > Name node :- > java.io.IOException: File > /hbase/event_data/433b61f2a4ebff8f2e4b89890508a3b7/.tmp/99797a61a8f7471cb= 6df8f7b95f18e9e > could only be replicated to 0 nodes, instead of 1 > > java.io.IOException: Got blockReceived message from unregistered or dead > node blk_-2949905629769882833_52274 > > Data node :- > 480000 millis timeout while waiting for channel to be ready for write. ch > : java.nio.channels.SocketChannel[connected local=3D/192.168.20.30:50010 > remote=3D/192.168.20.30:36188] > > ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: > DatanodeRegistration(192.168.20.30:50010, > storageID=3DDS-1816106352-192.168.20.30-50010-1369314076237, infoPort=3D5= 0075, > ipcPort=3D50020):DataXceiver > > java.io.EOFException: while trying to read 39309 bytes > > > On Tue, Oct 22, 2013 at 10:19 PM, Ted Yu wrote: > >> bq. java.io.IOException: File /hbase/event_data/ >> 4c3765c51911d6c67037a983d205a010/.tmp/bfaf8df33d5b4068825e3664d3e4b2b0 >> could >> only be replicated to 0 nodes, instead of 1 >> >> Have you checked Namenode / Datanode logs ? >> Looks like hdfs was not stable. >> >> >> On Tue, Oct 22, 2013 at 9:01 AM, Vimal Jain wrote: >> >> > HI Jean, >> > Thanks for your reply. >> > I have total 8 GB memory and distribution is as follows:- >> > >> > Region server - 2 GB >> > Master,Namenode,Datanode,Secondary Namenode,Zookepeer - 1 GB >> > OS - 1 GB >> > >> > Please let me know if you need more information. >> > >> > >> > On Tue, Oct 22, 2013 at 8:15 PM, Jean-Marc Spaggiari < >> > jean-marc@spaggiari.org> wrote: >> > >> > > Hi Vimal, >> > > >> > > What are your settings? Memory of the host, and memory allocated for >> the >> > > different HBase services? >> > > >> > > Thanks, >> > > >> > > JM >> > > >> > > >> > > 2013/10/22 Vimal Jain >> > > >> > > > Hi, >> > > > I am running in Hbase in pseudo distributed mode. ( Hadoop version= - >> > > 1.1.2 >> > > > , Hbase version - 0.94.7 ) >> > > > I am getting few exceptions in both hadoop ( namenode , datanode) >> logs >> > > and >> > > > hbase(region server). >> > > > When i search for these exceptions on google , i concluded that >> > problem >> > > is >> > > > mainly due to large number of full GC in region server process. >> > > > >> > > > I used jstat and found that there are total of 950 full GCs in spa= n >> of >> > 4 >> > > > days for region server process.Is this ok? >> > > > >> > > > I am totally confused by number of exceptions i am getting. >> > > > Also i get below exceptions intermittently. >> > > > >> > > > >> > > > Region server:- >> > > > >> > > > 2013-10-22 12:00:26,627 WARN org.apache.hadoop.ipc.HBaseServer: >> > > > (responseTooSlow): >> > > > {"processingtimems":15312,"call":"next(-6681408251916104762, 1000)= , >> rpc >> > > > version=3D1, client version=3D29, >> > methodsFingerPrint=3D-1368823753","client":" >> > > > 192.168.20.31:48270 >> > > > >> > > > >> > > >> > >> ","starttimems":1382423411293,"queuetimems":0,"class":"HRegionServer","r= esponsesize":4808556,"method":"next"} >> > > > 2013-10-22 12:06:17,606 WARN org.apache.hadoop.ipc.HBaseServer: >> > > > (operationTooSlow): {"processingtimems":14759,"client":" >> > > > 192.168.20.31:48247 >> > > > >> > > > >> > > >> > >> ","timeRange":[0,9223372036854775807],"starttimems":1382423762845,"respo= nsesize":61,"class":"HRegionServer","table":"event_data","cacheBlocks":true= ,"families":{"ginfo":["netGainPool"]},"row":"1629657","queuetimems":0,"meth= od":"get","totalColumns":1,"maxVersions":1} >> > > > >> > > > 2013-10-18 10:37:45,008 WARN org.apache.hadoop.hdfs.DFSClient: >> > > DataStreamer >> > > > Exception: org.apache.hadoop.ipc.RemoteException: >> java.io.IOException: >> > > File >> > > > >> > > > >> > > >> > >> /hbase/event_data/4c3765c51911d6c67037a983d205a010/.tmp/bfaf8df33d5b4068= 825e3664d3e4b2b0 >> > > > could only be replicated to 0 nodes, instead of 1 >> > > > at >> > > > >> > > > >> > > >> > >> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(F= SNamesystem.java:1639) >> > > > >> > > > Name node :- >> > > > java.io.IOException: File >> > > > >> > > > >> > > >> > >> /hbase/event_data/433b61f2a4ebff8f2e4b89890508a3b7/.tmp/99797a61a8f7471c= b6df8f7b95f18e9e >> > > > could only be replicated to 0 nodes, instead of 1 >> > > > >> > > > java.io.IOException: Got blockReceived message from unregistered o= r >> > dead >> > > > node blk_-2949905629769882833_52274 >> > > > >> > > > Data node :- >> > > > 480000 millis timeout while waiting for channel to be ready for >> write. >> > > ch : >> > > > java.nio.channels.SocketChannel[connected local=3D/ >> 192.168.20.30:50010 >> > > > remote=3D/ >> > > > 192.168.20.30:36188] >> > > > >> > > > ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: >> > > > DatanodeRegistration( >> > > > 192.168.20.30:50010, >> > > > storageID=3DDS-1816106352-192.168.20.30-50010-1369314076237, >> > > infoPort=3D50075, >> > > > ipcPort=3D50020):DataXceiver >> > > > java.io.EOFException: while trying to read 39309 bytes >> > > > >> > > > >> > > > -- >> > > > Thanks and Regards, >> > > > Vimal Jain >> > > > >> > > >> > >> > >> > >> > -- >> > Thanks and Regards, >> > Vimal Jain >> > >> > > > > -- > Thanks and Regards, > Vimal Jain > --=20 Thanks and Regards, Vimal Jain --089e0102f84afadbc904e9763310 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Hi Ted/Jean,
Can you please help here ?
<= /div>


On Tue, = Oct 22, 2013 at 10:29 PM, Vimal Jain <vkjk89@gmail.com> wrote= :
Hi Ted,
Yes i checked n= amenode and datanode logs and i found below exceptions in both the logs:-

Name node :-
java.io.IOException: File /hbase/even= t_data/433b61f2a4ebff8f2e4b89890508a3b7/.tmp/99797a61a8f7471cb6df8f7b95f18e= 9e could only be replicated to 0 nodes, instead of 1

java.io.IOException: Got blockReceived message from unregistered or dea= d node blk_-2949905629769882833_52274

Data node :-
480000 millis time= out while waiting for channel to be ready for write. ch : java.nio.channels= .SocketChannel[connected local=3D/192.168.20.30:50010=A0remote=3D/192.168.20.30:36188]

ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistra= tion(192.168.20.3= 0:50010, storageID=3DDS-1816106352-192.168.20.30-50010-1369314076237, i= nfoPort=3D50075, ipcPort=3D50020):DataXceiver

java.io.EOFException: while trying to read 39309 bytes

<= br>
On Tue, Oct 22, 2013 at 10:19 PM, Ted Yu <= yuzhihong@gmail.com> wrote:
bq. java.io.IOException: File /hbase/event_d= ata/
4c3765c51911d6c67037a983d205a010/.tmp/bfaf8df33d5b4068825e3664d3e4b2b0 coul= d
only be replicated to 0 nodes, instead of 1

Have you checked Namenode / Datanode logs ?
Looks like hdfs was not stable.


On Tue, Oct 22, 2013 at 9:01 AM, Vimal Jain <vkjk89@gmail.com> wrote:

> HI Jean,
> Thanks for your reply.
> I have total 8 GB memory and distribution is as follows:-
>
> Region server =A0- 2 GB
> Master,Namenode,Datanode,Secondary Namenode,Zookepeer - 1 GB
> OS - 1 GB
>
> Please let me know if you need more information.
>
>
> On Tue, Oct 22, 2013 at 8:15 PM, Jean-Marc Spaggiari <
> jean-marc= @spaggiari.org> wrote:
>
> > Hi Vimal,
> >
> > What are your settings? Memory of the host, and memory allocated = for the
> > different HBase services?
> >
> > Thanks,
> >
> > JM
> >
> >
> > 2013/10/22 Vimal Jain <vkjk89@gmail.com>
> >
> > > Hi,
> > > I am running in Hbase in pseudo distributed mode. ( Hadoop v= ersion -
> > 1.1.2
> > > , Hbase version - 0.94.7 )
> > > I am getting few exceptions in both hadoop ( namenode , data= node) logs
> > and
> > > hbase(region server).
> > > When i search for these exceptions on google , i concluded = =A0that
> problem
> > is
> > > mainly due to large number of full GC in region server proce= ss.
> > >
> > > I used jstat and found that there are total of 950 full GCs = in span of
> 4
> > > days for region server process.Is this ok?
> > >
> > > I am totally confused by number of exceptions i am getting.<= br> > > > Also i get below exceptions intermittently.
> > >
> > >
> > > Region server:-
> > >
> > > 2013-10-22 12:00:26,627 WARN org.apache.hadoop.ipc.HBaseServ= er:
> > > (responseTooSlow):
> > > {"processingtimems":15312,"call":"n= ext(-6681408251916104762, 1000), rpc
> > > version=3D1, client version=3D29,
> methodsFingerPrint=3D-1368823753","client":"
> > > 192= .168.20.31:48270
> > >
> > >
> >
> ","starttimems":1382423411293,"queuetimems":0= ,"class":"HRegionServer","responsesize":48085= 56,"method":"next"}
> > > 2013-10-22 12:06:17,606 WARN org.apache.hadoop.ipc.HBaseServ= er:
> > > (operationTooSlow): {"processingtimems":14759,&quo= t;client":"
> > > 192= .168.20.31:48247
> > >
> > >
> >
> ","timeRange":[0,9223372036854775807],"starttimems= ":1382423762845,"responsesize":61,"class":"HR= egionServer","table":"event_data","cacheBlock= s":true,"families":{"ginfo":["netGainPool&quo= t;]},"row":"1629657","queuetimems":0,"me= thod":"get","totalColumns":1,"maxVersions&quo= t;:1}
> > >
> > > 2013-10-18 10:37:45,008 WARN org.apache.hadoop.hdfs.DFSClien= t:
> > DataStreamer
> > > Exception: org.apache.hadoop.ipc.RemoteException: java.io.IO= Exception:
> > File
> > >
> > >
> >
> /hbase/event_data/4c3765c51911d6c67037a983d205a010/.tmp/bfaf8df33d5b40= 68825e3664d3e4b2b0
> > > could only be replicated to 0 nodes, instead of 1
> > > =A0 =A0 at
> > >
> > >
> >
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock= (FSNamesystem.java:1639)
> > >
> > > Name node :-
> > > java.io.IOException: File
> > >
> > >
> >
> /hbase/event_data/433b61f2a4ebff8f2e4b89890508a3b7/.tmp/99797a61a8f747= 1cb6df8f7b95f18e9e
> > > could only be replicated to 0 nodes, instead of 1
> > >
> > > java.io.IOException: Got blockReceived message from unregist= ered or
> dead
> > > node blk_-2949905629769882833_52274
> > >
> > > Data node :-
> > > 480000 millis timeout while waiting for channel to be ready = for write.
> > ch :
> > > java.nio.channels.SocketChannel[connected local=3D/192.168.20.30:50010 > > > remote=3D/
> > > 192= .168.20.30:36188]
> > >
> > > ERROR org.apache.hadoop.hdfs.server.datanode.DataNode:
> > > DatanodeRegistration(
> > > 192= .168.20.30:50010,
> > > storageID=3DDS-1816106352-192.168.20.30-50010-1369314076237,=
> > infoPort=3D50075,
> > > ipcPort=3D50020):DataXceiver
> > > java.io.EOFException: while trying to read 39309 bytes
> > >
> > >
> > > --
> > > Thanks and Regards,
> > > Vimal Jain
> > >
> >
>
>
>
> --
> Thanks and Regards,
> Vimal Jain
>



--
=
Thanks and Regards,<= /font>
Vimal Ja= in



--
Thanks and Regards,
Vimal Jain<= /div>
--089e0102f84afadbc904e9763310--