hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Purtell <apurt...@apache.org>
Subject Re: Re: Re: Re: taking snapshot's creates to many TCP CLOSE_WAIT handles on the hbase master server
Date Thu, 08 May 2014 23:55:51 GMT
Thanks for the detail.

Unless you've changed it, port 50010 is the *DataNode* data transfer
socket. I'm surprised the HDFS tunings suggested by others on this thread
have not had an impact.

I filed https://issues.apache.org/jira/browse/HBASE-11142 to track this
report.



On Mon, May 5, 2014 at 5:19 PM, Hansi Klose <hansi.klose@web.de> wrote:

> Hi Andrew,
>
> here is the output from our testing environment.
> There we can see the same behavior like in our production environment.
>
> Sorry if my description was not clear.
> The connection source is the hbase master process PID 793 and the target
> are
> the datanode port of our 3 regionserver.
>
> hbase master:   lsof | grep TCP | grep CLOSE_WAIT
>
> http://pastebin.com/BTyiVgb2
>
> Here are 40 connection in state CLOSE_WAIT to our 3 region server.
> This connection are there since last week.
>
> Regards Hansi
>
> > Gesendet: Mittwoch, 30. April 2014 um 18:48 Uhr
> > Von: "Andrew Purtell" <apurtell@apache.org>
> > An: "user@hbase.apache.org" <user@hbase.apache.org>
> > Betreff: Re: Re: Re: taking snapshot's creates to many TCP CLOSE_WAIT
> handles on the hbase master server
> >
> > Let's circle back to the original mail:
> >
> > > When I run lsof I saw that there where a lot of TCP CLOSE_WAIT handles
> > open with the regionserver as target.
> >
> > Is that right? *Regionserver*, not another process (datanode or
> whatever)?
> > Or did I miss where somewhere along this thread there was evidence
> > confirming a datanode was the remote?
> >
> > If you are sure that the stuck connections are to the regionserver
> process
> > (maybe pastebin lsof output so we can double check the port numbers
> > involved?) then the regionserver is closing the connection but the master
> > is not somehow, by definition of what CLOSE_WAIT means. HDFS settings
> won't
> > matter if it is the master is failing to close a socket, maybe this is an
> > IPC bug.
> >
> >
> >
> > On Wed, Apr 30, 2014 at 12:38 AM, Hansi Klose <hansi.klose@web.de>
> wrote:
> >
> > > Hi,
> > >
> > > sorry i missed that  :-(
> > >
> > > I tried that parameter in my hbase-site.xml and restartet the hbase
> master
> > > and all regionserver.
> > >
> > >   <property>
> > >     <name>dfs.client.socketcache.expiryMsec</name>
> > >     <value>900</value>
> > >   </property>
> > >
> > > No change, the ClOSE_WAIT sockets still persists on the hbase master
> to the
> > > regionserver's datanode after taking snapshots.
> > >
> > > Because it was not clear for me where to the setting has to go
> > > i put it in our hdfs-site.xml too and restarted all datanodes.
> > > I thought that settings with dfs.client maybe have to go there.
> > > But this did not change the behavior either.
> > >
> > > Regards Hansi
> > >
> > > > Gesendet: Dienstag, 29. April 2014 um 19:21 Uhr
> > > > Von: Stack <stack@duboce.net>
> > > > An: Hbase-User <user@hbase.apache.org>
> > > > Betreff: Re: Re: taking snapshot's creates to many TCP CLOSE_WAIT
> > > handles on the hbase master server
> > > >
> > > > On Tue, Apr 29, 2014 at 8:15 AM, Hansi Klose <hansi.klose@web.de>
> wrote:
> > > >
> > > > > Hi all,
> > > > >
> > > > > sorry for the late answer.
> > > > >
> > > > > I configured the hbase-site.conf like this
> > > > >
> > > > >   <property>
> > > > >     <name>dfs.client.socketcache.capacity</name>
> > > > >     <value>0</value>
> > > > >   </property>
> > > > >   <property>
> > > > >     <name>dfs.datanode.socket.reuse.keepalive</name>
> > > > >     <value>0</value>
> > > > >   </property>
> > > > >
> > > > > and restarted the hbase master and all regionservers.
> > > > > I still can see the same behavior. Each snapshot creates
> > > > > new CLOSE_WAIT Sockets which stay there till hbase master restart.
> > > > >
> > > > > I there any other setting I can try?
> > > > >
> > > >
> > > > You saw my last suggestion about
> "...dfs.client.socketcache.expiryMsec to
> > > > 900 in your HBase client configuration.."?
> > > >
> > > > St.Ack
> > > >
> > >
> >
> >
> >
> > --
> > Best regards,
> >
> >    - Andy
> >
> > Problems worthy of attack prove their worth by hitting back. - Piet Hein
> > (via Tom White)
> >
>



-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message