hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: zookeeper closing socket connection exception
Date Tue, 02 Jun 2015 16:21:38 GMT
How much heap did you give the region server ?

How much total memory does the box have ?

I guess you have read http://hbase.apache.org/book.html#jvm

If you're using jdk 1.7.0_60 or newer, you can consider using G1GC.

Cheers

On Tue, Jun 2, 2015 at 3:26 AM, jeevi tesh <jeevitesh.ms@gmail.com> wrote:

> First of all thanks a lot for coming forward with helping hand.
>
> Here my answers along with the question you asked
>
>
>
> How many zookeeper servers do you have ? Or what is the number of clients
> you have running per host
>
> Ans: I have only one linux box which is only one node system.
>
> Basically in a single system I have installed Hbase.
>
>
>
> what is the configured value of maxClientCnxns in the ZooKeeper servers?
>
> Ans: We are using the default configuration. We have not introduced any new
> value in hbase-site.xml
>
>
>
> Is the issue impacting clients only or is it also impacting the
> RegionServers
>
> Ans: In this case all regional server, master node, client is same. Because
> we have installed hbase in a single system
>
>
> Have you looked into why the ZooKeeper server is no longer accepting
> connections
>
> Ans: Now I checked logs of hbase just at the moment my application broke
> for me it l*ooked like JVM went for Garbage collection after that it newer
> came back.* *Which resulted in exception.Is my interpretation correct.
> kindly let me know *
>
> Here is the complete log
>
> 2015-06-01 19:59:53,808 INFO  [pool-55-thread-1] master.HMaster: Master has
> completed initialization
>
> 2015-06-01 19:59:53,808 INFO  [main-EventThread] zookeeper.ClientCnxn:
> EventThread shut down
>
> 2015-06-01 20:00:46,431 INFO  [JvmPauseMonitor] util.JvmPauseMonitor:
> Detected pause in JVM or host machine (eg GC): pause of approximately
> 6885ms
>
> GC pool 'ParNew' had collection(s): count=1 time=7383ms
>
> 2015-06-01 20:00:46,431 INFO  [JvmPauseMonitor] util.JvmPauseMonitor:
> Detected pause in JVM or host machine (eg GC): pause of approximately
> 6886ms
>
> GC pool 'ParNew' had collection(s): count=1 time=7383ms
>
> 2015-06-01 20:00:47,032 WARN  [M:0;hadoop2:35923.oldLogCleaner]
> cleaner.CleanerChore: A file cleanerM:0;hadoop2:35923.oldLogCleaner is
> stopped, won't delete any more files
> in:file:/home/hadoop/hbaseDataDir/oldWALs
>
> 2015-06-01 20:02:05,148 WARN  [M:0;hadoop2:35923.oldLogCleaner]
> util.Sleeper: We slept 78116ms instead of 60000ms, this is likely due to a
> long garbage collecting pause and it's usually bad, see
> http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
>
> 2015-06-01 20:02:05,148 WARN  [M:0;hadoop2:35923.archivedHFileCleaner]
> util.Sleeper: We slept 78122ms instead of 60000ms, this is likely due to a
> long garbage collecting pause and it's usually bad, see
> http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
>
> 2015-06-01 20:02:05,149 WARN
> [hadoop2,35923,1432909409923-ClusterStatusChore] util.Sleeper: We slept
> 78128ms instead of 60000ms, this is likely due to a long garbage collecting
> pause and it's usually bad, see
> http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
>
> 2015-06-01 20:02:05,149 WARN  [RS:0;hadoop2:40129] util.Sleeper: We slept
> 39687ms instead of 3000ms, this is likely due to a long garbage collecting
> pause and it's usually bad, see
> http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
>
> 2015-06-01 20:02:05,151 WARN  [JvmPauseMonitor] util.JvmPauseMonitor:
> Detected pause in JVM or host machine (eg GC): pause of approximately
> 39206ms
>
> GC pool 'ParNew' had collection(s): count=1 time=39328ms
>
> 2015-06-01 20:02:05,151 WARN  [M:0;hadoop2:35923] util.Sleeper: We slept
> 39345ms instead of 100ms, this is likely due to a long garbage collecting
> pause and it's usually bad, see
> http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
>
> 2015-06-01 20:02:05,151 WARN  [JvmPauseMonitor] util.JvmPauseMonitor:
> Detected pause in JVM or host machine (eg GC): pause of approximately
> 39205ms
>
> GC pool 'ParNew' had collection(s): count=1 time=39328ms
>
> 2015-06-01 20:02:05,151 INFO  [SessionTracker] server.ZooKeeperServer:
> Expiring session 0x14da00e69e00001, timeout of 40000ms exceeded
>
> 2015-06-01 20:02:05,151 INFO  [RS:0;hadoop2:40129-SendThread(hadoop2:2181)]
> zookeeper.ClientCnxn: Client session timed out, have not heard from server
> in 52055ms for sessionid 0x14da00e69e00001, closing socket connection and
> attempting reconnect
>
> 2015-06-01 20:02:05,151 INFO  [RS:0;hadoop2:40129-SendThread(hadoop2:2181)]
> zookeeper.ClientCnxn: Client session timed out, have not heard from server
> in 52053ms for sessionid 0x14da00e69e00004, closing socket connection and
> attempting reconnect
>
> 2015-06-01 20:02:05,151 WARN
> [hadoop2,35923,1432909409923.splitLogManagerTimeoutMonitor] util.Sleeper:
> We slept 39713ms instead of 1000ms, this is likely due to a long garbage
> collecting pause and it's usually bad, see
> http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
>
> 2015-06-01 20:02:05,155 WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181]
> server.NIOServerCnxn: caught end of stream exception
>
> EndOfStreamException: Unable to read additional data from client sessionid
> 0x14da00e69e00001, likely client has closed socket
>
>           at
> org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:220)
>
>           at
>
> org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
>
>           at java.lang.Thread.run(Thread.java:745)
>
>
>
>
>
> On Tue, Jun 2, 2015 at 12:45 AM, jeevi tesh <jeevitesh.ms@gmail.com>
> wrote:
>
> > Hi,
> > I'm running into this issue several times but still not able resolve
> > kindly help me in this regard.
> > I have written a crawler which will be keep running for several days
> after
> > 4 days of continuous interaction of data base with my application system.
> > Data base fails to responsed. I'm not able to figure where things all of
> a
> > sudden can go wrong after 4 days of proper running.
> > My configuration i have used hbase 0.96.2 single server.
> > jdk 1.7
> >
> > issue is this following error
> > WARN  [http-bio-8080-exec-4-SendThread(hadoop2:2181)]
> zookeeper.ClientCnxn
> > (ClientCnxn.java:run(1089)) - Session 0x14da00e69e001ad for server null,
> > unexpected error, closing socket connection and attempting reconnect
> > java.net.ConnectException: Connection refused
> > If this exception happens only solution i have is restart hbase that is
> > not a viable solution because that will corrupt my system data.
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message