hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vimal Jain <vkj...@gmail.com>
Subject Re: High Full GC count for Region server
Date Tue, 29 Oct 2013 05:18:52 GMT
Hi,
Here is my analysis of this problem.Please correct me if i wrong somewhere.
I have assigned 2 GB to region server process.I think its sufficient enough
to handle around 9GB of data.
I have not changed much of the parameters , especially memstore size which
is 128 GB for 0.94.7 by default.
Also as per my understanding , each col-family has one memstore associated
with it.So my memstores are taking 128*3 = 384 MB ( I have 3 column
families).
So i think i should reduce memstore size to something like 32/64 MB so that
data is flushed to disk at higher frequency then current frequency.This
will save some memory.
Is there any other parameter other then memstore size which affects memory
utilization.

Also I am getting below exceptions in data node log and region server log
every day.Is it due to long GC pauses ?

Data node logs :-

hadoop-hadoop-datanode-woody.log:2013-10-29 00:12:13,127 WARN
org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
192.168.20.30:5001
0, storageID=DS-1816106352-192.168.20.30-50010-1369314076237,
infoPort=50075, ipcPort=50020):Got exception while serving
blk_-560908881317618221_58058
 to /192.168.20.30:
hadoop-hadoop-datanode-woody.log:java.net.SocketTimeoutException: 480000
millis timeout while waiting for channel to be ready for write. ch :
java.nio
.channels.SocketChannel[connected local=/192.168.20.30:50010 remote=/
192.168.20.30:39413]
hadoop-hadoop-datanode-woody.log:2013-10-29 00:12:13,127 ERROR
org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
192.168.20.30:500
10, storageID=DS-1816106352-192.168.20.30-50010-1369314076237,
infoPort=50075, ipcPort=50020):DataXceiver
hadoop-hadoop-datanode-woody.log:java.net.SocketTimeoutException: 480000
millis timeout while waiting for channel to be ready for write. ch :
java.nio
.channels.SocketChannel[connected local=/192.168.20.30:50010 remote=/
192.168.20.30:39413]


Region server logs :-

hbase-hadoop-regionserver-woody.log:2013-10-29 01:01:16,475 WARN
org.apache.hadoop.ipc.HBaseServer: (responseTooSlow):
{"processingtimems":15827,"call
":"multi(org.apache.hadoop.hbase.client.MultiAction@2918e464), rpc
version=1, client version=29,
methodsFingerPrint=-1368823753","client":"192.168.20.
31:50619","starttimems":1382988660645,"queuetimems":0,"class":"HRegionServer","responsesize":0,"method":"multi"}
hbase-hadoop-regionserver-woody.log:2013-10-29 06:01:27,459 WARN
org.apache.hadoop.ipc.HBaseServer: (operationTooSlow):
{"processingtimems":14745,"cli
ent":"192.168.20.31:50908
","timeRange":[0,9223372036854775807],"starttimems":1383006672707,"responsesize":55,"class":"HRegionServer","table":"event_da
ta","cacheBlocks":true,"families":{"oinfo":["clubStatus"]},"row":"1752869","queuetimems":1,"method":"get","totalColumns":1,"maxVersions":1}





On Mon, Oct 28, 2013 at 11:55 PM, Asaf Mesika <asaf.mesika@gmail.com> wrote:

> Check through HDFS UI that your cluster haven't reached maximum disk
> capacity
>
> On Thursday, October 24, 2013, Vimal Jain wrote:
>
> > Hi Ted/Jean,
> > Can you please help here ?
> >
> >
> > On Tue, Oct 22, 2013 at 10:29 PM, Vimal Jain <vkjk89@gmail.com
> <javascript:;>>
> > wrote:
> >
> > > Hi Ted,
> > > Yes i checked namenode and datanode logs and i found below exceptions
> in
> > > both the logs:-
> > >
> > > Name node :-
> > > java.io.IOException: File
> > >
> >
> /hbase/event_data/433b61f2a4ebff8f2e4b89890508a3b7/.tmp/99797a61a8f7471cb6df8f7b95f18e9e
> > > could only be replicated to 0 nodes, instead of 1
> > >
> > > java.io.IOException: Got blockReceived message from unregistered or
> dead
> > > node blk_-2949905629769882833_52274
> > >
> > > Data node :-
> > > 480000 millis timeout while waiting for channel to be ready for write.
> ch
> > > : java.nio.channels.SocketChannel[connected local=/192.168.20.30:50010
> > >  remote=/192.168.20.30:36188]
> > >
> > > ERROR org.apache.hadoop.hdfs.server.datanode.DataNode:
> > > DatanodeRegistration(192.168.20.30:50010,
> > > storageID=DS-1816106352-192.168.20.30-50010-1369314076237,
> > infoPort=50075,
> > > ipcPort=50020):DataXceiver
> > >
> > > java.io.EOFException: while trying to read 39309 bytes
> > >
> > >
> > > On Tue, Oct 22, 2013 at 10:19 PM, Ted Yu <yuzhihong@gmail.com> wrote:
> > >
> > >> bq. java.io.IOException: File /hbase/event_data/
> > >> 4c3765c51911d6c67037a983d205a010/.tmp/bfaf8df33d5b4068825e3664d3e4b2b0
> > >> could
> > >> only be replicated to 0 nodes, instead of 1
> > >>
> > >> Have you checked Namenode / Datanode logs ?
> > >> Looks like hdfs was not stable.
> > >>
> > >>
> > >> On Tue, Oct 22, 2013 at 9:01 AM, Vimal Jain <vkjk89@gmail.com> wrote:
> > >>
> > >> > HI Jean,
> > >> > Thanks for your reply.
> > >> > I have total 8 GB memory and distribution is as follows:-
> > >> >
> > >> > Region server  - 2 GB
> > >> > Master,Namenode,Datanode,Secondary Namenode,Zookepeer - 1 GB
> > >> > OS - 1 GB
> > >> >
> > >> > Please let me know if you need more information.
> > >> >
> > >> >
> > >> > On Tue, Oct 22, 2013 at 8:15 PM, Jean-Marc Spaggiari <
> > >> > jean-marc@spaggiari.org> wrote:
> > >> >
> > >> > > Hi Vimal,
> > >> > >
> > >> > > What are your settings? Memory of the host, and memory allocated
> for
> > >> the
> > >> > > different HBase services?
> > >> > >
> > >> > > Thanks,
> > >> > >
> > >> > > JM
> > >> > >
> > >> > >
> > >> > > 2013/10/22 Vimal Jain <vkjk89@gmail.com>
> > >> > >
> > >> > > > Hi,
> > >> > > > I am running in Hbase in pseudo distributed mode. ( Hadoop
> > version -
> > >> > > 1.1.2
> > >> > > > , Hbase version - 0.94.7 )
> > >> > > > I am getting few exceptions in both hadoop ( namenode ,
> datanode)
> > >> logs
> > >> > > and
> > >> > > > hbase(region server).
> > >> > > > When i search for these exceptions on google , i concluded
 that
> > >> > problem
> > >> > > is
> > >> > > > mainly due to large number of full GC in region server process.
> > >> > > >
> > >> > > > I used jstat and found that there are total of 950 full
GCs in
> > span
> > >> of
> > >> > 4
> > >> > > > days for region server process.Is this ok?
> > >> > > >
> > >> > > > I am totally confused by number of exceptions i am getting.
> > >> > > > Also i get below exceptions intermittently.
> > >> > > >
> > >> > > >
> > >> > > > Region server:-
> > >> > > >
> > >> > > > 2013-10-22 12:00:26,627 WARN org.apache.hadoop.ipc.HBaseServer:
> > >> > > > (responseTooSlow):
> > >> > > > {"processingtimems":15312,"call":"next(-6681408251916104762,
> > 1000),
> > >> rpc
> > >> > > > version=1, client version=29,
> > >> > methodsFingerPrint=-1368823753","client":"
> > >> > > > 192.168.20.31:48270
> > >> > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> ","starttimems":1382423411293,"queuetimems":0,"class":"HRegionServer","responsesize":4808556,"method":"next"}
> > >> > > > 2013-10-22 12:06:17,606 WARN org.apache.hadoop.ipc.HBaseServer:
> > >> > > > (operationTooSlow): {"processingtimems":14759,"client":"
> > >> > > > 192.168.20.31:48247
> > >> > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> ","timeRange":[0,9223372036854775807],"starttimems":1382423762845,"responsesize":61,"class":"HRegionServer","table":"event_data","cacheBlocks":true,"families":{"gin
>



-- 
Thanks and Regards,
Vimal Jain

Mime
View raw message