hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thomas Downing <tdown...@proteus-technologies.com>
Subject Re: High ingest rate and FIN_WAIT1 problems
Date Tue, 20 Jul 2010 08:58:13 GMT
Yes, I did try the timeout of 0.  As expected, I did not see sockets
in FIN_WAIT2 or TIME_WAIT for very long.

I still leak sockets at the ingest rates I need - the FIN_WAIT1
problem.  Also, with the more careful observations this time around,
I noted that even before the FIN_WAIT1 problem starts to crop
up (at around 1600M inserts) there is already a slower socket
leakage with timeout=0 and no FIN_WAIT1 problem.  At 100M
sockets were hovering around 50-60, by 800M they were around
200, and at 1600M they were at 400.  This is slower than without
the timeout set to 0 (about half the rate), but it is still ultimately 
fatal.

This socket increase is all between hbase and hadoop, none
between test client and hbase.

While the FIN_WAIT1 problem is triggered by an hbase side
issue, I have no indication of which side causes this other leak.

thanks

thomas downing

On 7/19/2010 4:31 PM, Ryan Rawson wrote:
> Did you try the setting I suggested?  There is/was a known bug in HDFS
> which can cause issues which may include "abandoned" sockets such as
> you are describing.
>
> -ryan
>
> On Mon, Jul 19, 2010 at 2:13 AM, Thomas Downing
> <tdowning@proteus-technologies.com>  wrote:
>    
>> Thanks for the response, but my problem is not with FIN_WAIT2, it
>> is with FIN_WAIT1.
>>
>> If it was FIN_WAIT2, the only concern would be socket leakage,
>> and if  setting the time out solved the issue, that would be great.
>>
>> The problem with FIN_WAIT1 is twofold - first, it is incumbent on
>> the application to notice and handle this problem; from the TCP stack
>> point of view, there is nothing wrong.  It is just a special case of slow
>> consumer.  The other problem is that it implies that something will be
>> lost if the socket is abandoned, there is data in the send queue of the
>> socket in FIN_WAIT1 that has not yet been delivered to the peer.
>>
>> On 7/16/2010 3:56 PM, Ryan Rawson wrote:
>>      
>>> I've been running with this setting on both the HDFS side and the
>>> HBase side for over a year now, it's a bit of voodoo but you might be
>>> running into well known suckage of HDFS.  Try this one and restart
>>> your hbase&    hdfs.
>>>
>>> The FIN_WAIT2/TIME_WAIT happens more on large concurrent gets, not so
>>> much for inserts.
>>>
>>> <property>
>>> <name>dfs.datanode.socket.write.timeout</name>
>>> <value>0</value>
>>> </property>
>>>
>>> -ryan
>>>
>>>
>>> On Fri, Jul 16, 2010 at 9:33 AM, Thomas Downing
>>> <tdowning@proteus-technologies.com>    wrote:
>>>
>>>        
>>>> Thanks for the response.
>>>>
>>>> My understanding is that TCP_FIN_TIMEOUT affects only FIN_WAIT2,
>>>> my problem is with FIN_WAIT1.
>>>>
>>>> While I do see some sockets in TIME_WAIT, they are only a few, and the
>>>> number is not growing.
>>>>
>>>> On 7/16/2010 12:07 PM, Hegner, Travis wrote:
>>>>
>>>>          
>>>>> Hi Thomas,
>>>>>
>>>>> I ran into a very similar issue when running slony-I on postgresql to
>>>>> replicate 15-20 databases.
>>>>>
>>>>> Adjusting the TCP_FIN_TIMEOUT parameters for the kernel may help to slow
>>>>> (or hopefully stop), the leaking sockets. I found some notes about
>>>>> adjusting
>>>>> TCP parameters here:
>>>>> http://www.hikaro.com/linux/tweaking-tcpip-syctl-conf.html
>>>>>
>>>>>
>>>>>            
>> [snip]
>>
>>      
> --
> Follow this link to mark it as spam:
> http://mailfilter.proteus-technologies.com/cgi-bin/learn-msg.cgi?id=6A53327EB7.A78FD
>
>
>    


Mime
View raw message