hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hansi Klose" <hansi.kl...@web.de>
Subject Aw: Re: Re: Re: taking snapshot's creates to many TCP CLOSE_WAIT handles on the hbase master server
Date Mon, 05 May 2014 09:19:53 GMT
Hi Andrew,

here is the output from our testing environment. 
There we can see the same behavior like in our production environment.

Sorry if my description was not clear.
The connection source is the hbase master process PID 793 and the target are
the datanode port of our 3 regionserver.

hbase master:   lsof | grep TCP | grep CLOSE_WAIT

http://pastebin.com/BTyiVgb2

Here are 40 connection in state CLOSE_WAIT to our 3 region server. 
This connection are there since last week.

Regards Hansi

> Gesendet: Mittwoch, 30. April 2014 um 18:48 Uhr
> Von: "Andrew Purtell" <apurtell@apache.org>
> An: "user@hbase.apache.org" <user@hbase.apache.org>
> Betreff: Re: Re: Re: taking snapshot's creates to many TCP CLOSE_WAIT handles on the
hbase master server
>
> Let's circle back to the original mail:
> 
> > When I run lsof I saw that there where a lot of TCP CLOSE_WAIT handles
> open with the regionserver as target.
> 
> Is that right? *Regionserver*, not another process (datanode or whatever)?
> Or did I miss where somewhere along this thread there was evidence
> confirming a datanode was the remote?
> 
> If you are sure that the stuck connections are to the regionserver process
> (maybe pastebin lsof output so we can double check the port numbers
> involved?) then the regionserver is closing the connection but the master
> is not somehow, by definition of what CLOSE_WAIT means. HDFS settings won't
> matter if it is the master is failing to close a socket, maybe this is an
> IPC bug.
> 
> 
> 
> On Wed, Apr 30, 2014 at 12:38 AM, Hansi Klose <hansi.klose@web.de> wrote:
> 
> > Hi,
> >
> > sorry i missed that  :-(
> >
> > I tried that parameter in my hbase-site.xml and restartet the hbase master
> > and all regionserver.
> >
> >   <property>
> >     <name>dfs.client.socketcache.expiryMsec</name>
> >     <value>900</value>
> >   </property>
> >
> > No change, the ClOSE_WAIT sockets still persists on the hbase master to the
> > regionserver's datanode after taking snapshots.
> >
> > Because it was not clear for me where to the setting has to go
> > i put it in our hdfs-site.xml too and restarted all datanodes.
> > I thought that settings with dfs.client maybe have to go there.
> > But this did not change the behavior either.
> >
> > Regards Hansi
> >
> > > Gesendet: Dienstag, 29. April 2014 um 19:21 Uhr
> > > Von: Stack <stack@duboce.net>
> > > An: Hbase-User <user@hbase.apache.org>
> > > Betreff: Re: Re: taking snapshot's creates to many TCP CLOSE_WAIT
> > handles on the hbase master server
> > >
> > > On Tue, Apr 29, 2014 at 8:15 AM, Hansi Klose <hansi.klose@web.de> wrote:
> > >
> > > > Hi all,
> > > >
> > > > sorry for the late answer.
> > > >
> > > > I configured the hbase-site.conf like this
> > > >
> > > >   <property>
> > > >     <name>dfs.client.socketcache.capacity</name>
> > > >     <value>0</value>
> > > >   </property>
> > > >   <property>
> > > >     <name>dfs.datanode.socket.reuse.keepalive</name>
> > > >     <value>0</value>
> > > >   </property>
> > > >
> > > > and restarted the hbase master and all regionservers.
> > > > I still can see the same behavior. Each snapshot creates
> > > > new CLOSE_WAIT Sockets which stay there till hbase master restart.
> > > >
> > > > I there any other setting I can try?
> > > >
> > >
> > > You saw my last suggestion about "...dfs.client.socketcache.expiryMsec to
> > > 900 in your HBase client configuration.."?
> > >
> > > St.Ack
> > >
> >
> 
> 
> 
> -- 
> Best regards,
> 
>    - Andy
> 
> Problems worthy of attack prove their worth by hitting back. - Piet Hein
> (via Tom White)
> 

Mime
View raw message