hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: "java.net.SocketException: Too many open files" AbortRegionServer But not ShutDown
Date Tue, 03 Jan 2017 15:05:45 GMT
Switching to user@

What's the version of hbase / hadoop you're using ?

Before issuing, "kill -9", did you capture stack trace of the region server
process ?

Have you read 'Limits on Number of Files and Processes' under
http://hbase.apache.org/book.html#basic.prerequisites ?

On Tue, Jan 3, 2017 at 6:56 AM, Weizhan Zeng <qgweizhan@gmail.com> wrote:

> Hi guys:
> I met an issue on one of my RS.
> After SocketException happend, It should shut down , but after 8 hours , I
> found it still alive and use kill -9 process to end up it.
>
> Here is my RegionServer log:
>
> In 01:58 AM , SocketException Happen,
>
>
>    1. [2017-01-02T01:58:00.469+08:00] [INFO] hdfs.DFSClient :
> Exception in createBlockOutputStream java.net.SocketException: Too
> many open files
>    2.     at sun.nio.ch.Net.socket0(Native Method)
>    3.     at sun.nio.ch.Net.socket(Net.java:423)
>    4.     at sun.nio.ch.Net.socket(Net.java:416)
>    5.     at sun.nio.ch.SocketChannelImpl.<init>(SocketChannelImp.java:
> 104)
>
> And in 01:58 AM, RegionServer aborted itself. And began to close region.
>
>
>    1. [2017-01-02T01:58:00.632+08:00] [INFO]
> regionserver.HRegionServer : aborting server
> HBASE-VENUS-149106.hadoop.local,16020,1482236933819
>    2. [2017-01-02T01:58:00.632+08:00] [INFO]
> client.ConnectionManager$HConnectionImplementation : Closing zookeeper
> sessionid=0x456f9b55fda457b
>    3. [2017-01-02T01:58:00.632+08:00] [INFO] regionserver.HStore : Closed
> f
>
>
>    1. 2017-01-02T01:59:18.067+08:00] [INFO]
> regionserver.HRegionServer$MovedRegionsCleaner : Chore:
> MovedRegionsCleaner for region
> HBASE-VENUS-149106.hadoop.local,16020,1482236933819 was stopped
>    2. [2017-01-02T01:59:18.225+08:00] [INFO] regionserver.Replication
> : Normal source for cluster 1: Total replicated edits: 39081044,
> currently replicating from:
> hdfs://venus/hbase/oldWALs/HBASE-VENUS-149106.hadoop.
> local%2C16020%2C1482236933819.default.1483293299516
> at position: 0
>
>
>    1. [2017-01-02T01:59:18.225+08:00] [INFO] regionserver.Replication
> : Sink: age in ms of last applied edit: 0, total replicated edits:
> 160769427
>
> After one Hour, It still log
>
>
>    1. [2017-01-02T02:04:18.225+08:00] [INFO] regionserver.Replication
> : Normal source for cluster 1: Total replicated edits: 39081044,
> currently replicating from:
> hdfs://venus/hbase/oldWALs/HBASE-VENUS-149106.hadoop.
> local%2C16020%2C1482236933819.default.1483293299516
> at position: 0
>
> At 8 AM
>
>
>    1. [2017-01-02T08:09:18.225+08:00] [INFO] regionserver.Replication
> : Sink: age in ms of last applied edit: 0, total replicated edits:
> 160769427
>    2. [2017-01-02T08:14:18.225+08:00] [INFO] regionserver.Replication
> : Normal source for cluster 1: Total replicated edits: 39081044,
> currently replicating
>
> Is anyone can give me some tips to find it out . thanks .
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message