hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Weizhan Zeng <qgweiz...@gmail.com>
Subject Re: "java.net.SocketException: Too many open files" AbortRegionServer But not ShutDown
Date Tue, 03 Jan 2017 15:49:10 GMT
sorry !!! I operate miss

2017-01-03 23:48 GMT+08:00 Weizhan Zeng <qgweizhan@gmail.com>:

> 机器ip:
>
> LF-HBASE-VENUS-149106.hadoop.jd.local
>
>
> jstack信息:/data0/hbase-logs/46384.out
>
> 2017-01-03 23:44 GMT+08:00 Weizhan Zeng <qgweizhan@gmail.com>:
>
>> My HBase version is 1.1.6 And Hadoop version is 2.6.1 。 I had jstack info
>> , I can give it to you tomorrow after I arrived my company .
>>
>> I guess the reason why "Too many open files" is too many storeFiles . I
>> saw my monitor and found storeFileCount is 33K , but ulimit  is 65535 。 The
>> reason why so many stofeFiles  seens compaction not worked.
>>
>>
>>
>> But confused me is why rs  not exit .
>> ​
>>
>> 2017-01-03 23:05 GMT+08:00 Ted Yu <yuzhihong@gmail.com>:
>>
>>> Switching to user@
>>>
>>> What's the version of hbase / hadoop you're using ?
>>>
>>> Before issuing, "kill -9", did you capture stack trace of the region
>>> server
>>> process ?
>>>
>>> Have you read 'Limits on Number of Files and Processes' under
>>> http://hbase.apache.org/book.html#basic.prerequisites ?
>>>
>>> On Tue, Jan 3, 2017 at 6:56 AM, Weizhan Zeng <qgweizhan@gmail.com>
>>> wrote:
>>>
>>> > Hi guys:
>>> > I met an issue on one of my RS.
>>> > After SocketException happend, It should shut down , but after 8 hours
>>> , I
>>> > found it still alive and use kill -9 process to end up it.
>>> >
>>> > Here is my RegionServer log:
>>> >
>>> > In 01:58 AM , SocketException Happen,
>>> >
>>> >
>>> >    1. [2017-01-02T01:58:00.469+08:00] [INFO] hdfs.DFSClient :
>>> > Exception in createBlockOutputStream java.net.SocketException: Too
>>> > many open files
>>> >    2.     at sun.nio.ch.Net.socket0(Native Method)
>>> >    3.     at sun.nio.ch.Net.socket(Net.java:423)
>>> >    4.     at sun.nio.ch.Net.socket(Net.java:416)
>>> >    5.     at sun.nio.ch.SocketChannelImpl.<
>>> init>(SocketChannelImp.java:
>>> > 104)
>>> >
>>> > And in 01:58 AM, RegionServer aborted itself. And began to close
>>> region.
>>> >
>>> >
>>> >    1. [2017-01-02T01:58:00.632+08:00] [INFO]
>>> > regionserver.HRegionServer : aborting server
>>> > HBASE-VENUS-149106.hadoop.local,16020,1482236933819
>>> >    2. [2017-01-02T01:58:00.632+08:00] [INFO]
>>> > client.ConnectionManager$HConnectionImplementation : Closing zookeeper
>>> > sessionid=0x456f9b55fda457b
>>> >    3. [2017-01-02T01:58:00.632+08:00] [INFO] regionserver.HStore :
>>> Closed
>>> > f
>>> >
>>> >
>>> >    1. 2017-01-02T01:59:18.067+08:00] [INFO]
>>> > regionserver.HRegionServer$MovedRegionsCleaner : Chore:
>>> > MovedRegionsCleaner for region
>>> > HBASE-VENUS-149106.hadoop.local,16020,1482236933819 was stopped
>>> >    2. [2017-01-02T01:59:18.225+08:00] [INFO] regionserver.Replication
>>> > : Normal source for cluster 1: Total replicated edits: 39081044,
>>> > currently replicating from:
>>> > hdfs://venus/hbase/oldWALs/HBASE-VENUS-149106.hadoop.
>>> > local%2C16020%2C1482236933819.default.1483293299516
>>> > at position: 0
>>> >
>>> >
>>> >    1. [2017-01-02T01:59:18.225+08:00] [INFO] regionserver.Replication
>>> > : Sink: age in ms of last applied edit: 0, total replicated edits:
>>> > 160769427
>>> >
>>> > After one Hour, It still log
>>> >
>>> >
>>> >    1. [2017-01-02T02:04:18.225+08:00] [INFO] regionserver.Replication
>>> > : Normal source for cluster 1: Total replicated edits: 39081044,
>>> > currently replicating from:
>>> > hdfs://venus/hbase/oldWALs/HBASE-VENUS-149106.hadoop.
>>> > local%2C16020%2C1482236933819.default.1483293299516
>>> > at position: 0
>>> >
>>> > At 8 AM
>>> >
>>> >
>>> >    1. [2017-01-02T08:09:18.225+08:00] [INFO] regionserver.Replication
>>> > : Sink: age in ms of last applied edit: 0, total replicated edits:
>>> > 160769427
>>> >    2. [2017-01-02T08:14:18.225+08:00] [INFO] regionserver.Replication
>>> > : Normal source for cluster 1: Total replicated edits: 39081044,
>>> > currently replicating
>>> >
>>> > Is anyone can give me some tips to find it out . thanks .
>>> >
>>>
>>
>>
>

Mime
  • Unnamed multipart/related (inline, None, 0 bytes)
View raw message