hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kevin <kevin.macksa...@gmail.com>
Subject Re: HBase logging only prints ulimit information
Date Thu, 24 May 2012 21:02:19 GMT
Good question. I did leave one thing out from my above solution.

For a while I have been getting log4j warning messages when using hbase
shell, but I have ignored them. Just recently I started to get log4j errors
about not having the RFA appender. I really cannot be sure when this all
happened, but I must have carried over my old log4j properties file from a
previous installation of HBase. I had to change DRFA to RFA in log4j
properties which fixed my log4j which fixed my log output which allowed me
to see that my regionservers were out of sync. I apologize for the
incomplete explanation.

-Kevin

On Thu, May 24, 2012 at 4:17 PM, Ian Varley <ivarley@salesforce.com> wrote:

> How did you end up figuring that out, Kevin? Was there a more ominous
> message in the logs about this? Should have logged something like:
>
> "WARNING: Server foo has been rejected; Reported time is too far out of
> sync with master"
>
> FWIW, HBASE-5770<http://issues.apache.org/jira/browse/HBASE-5770> (Jira's
> down, BTW) adds a lower "warning" threshold you can set so you start
> getting these warnings before it results in servers actually bailing out.
> Would that have helped you here?
>
> Ian
>
> On May 24, 2012, at 1:03 PM, Kevin wrote:
>
> In case if anyone comes across this, my issue was that somehow some of my
> regionservers stopped their ntpd process.
>
> On Thu, May 24, 2012 at 2:32 PM, Kevin <kevin.macksamie@gmail.com<mailto:
> kevin.macksamie@gmail.com>> wrote:
>
> Hi,
>
> My distributed cluster has been running for a couple of months, and I just
> noticed that HBase has stopping rolling log files. The last log file before
> the most current one (hbase-hbase-*.log) is dated April, 25
> (hbase-hbase-*.log.2012-04-25). The current log files show that HBase has
> been quietly restarting itself, and it seems it all started when the
> zookeeper connection closed at one point. I am trying to restart HBase to
> update it with my new coprocessors but only a couple of the servers are
> starting up. The log files only contain information from ulimit.
>
> Below is a little snippet of when I think things got out of hand. I
> understand from the HBase book that HBase prints in its logs as the first
> line the ulimit its seeing, but what would make it stop with no extra
> logging about errors that might make it shutdown? We have been running
> scans on the data successfully without know about this issue so maybe it
> just has to do with the logging mechanism?
>
> 2012-04-26 06:14:19,929 INFO org.apache.zookeeper.ZooKeeper: Session:
> 0x136eb113fe40006 closed
> 2012-04-26 06:14:19,929 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: stopping server
> slave2,60020,1335383643169; all regions closed.
> 2012-04-26 06:14:19,929 INFO org.apache.zookeeper.ClientCnxn: EventThread
> shut down
> 2012-04-26 06:14:19,929 DEBUG
> org.apache.hadoop.hbase.regionserver.wal.HLog: regionserver60020.logSyncer
> interrupted while waiting for sync requests
> 2012-04-26 06:14:19,929 INFO
> org.apache.hadoop.hbase.regionserver.wal.HLog: regionserver60020.logSyncer
> exiting
> 2012-04-26 06:14:19,930 DEBUG
> org.apache.hadoop.hbase.regionserver.wal.HLog: closing hlog writer in
> hdfs://master1:9000/hbase/.logs/slave2,60020,1335383643169
> 2012-04-26 06:14:19,976 DEBUG
> org.apache.hadoop.hbase.regionserver.wal.HLog: Moved 1 log files to
> /hbase/.oldlogs
> 2012-04-26 06:14:19,991 INFO org.apache.hadoop.hbase.regionserver.Leases:
> regionserver60020 closing leases
> 2012-04-26 06:14:19,991 INFO org.apache.hadoop.hbase.regionserver.Leases:
> regionserver60020 closed leases
> 2012-04-26 06:14:20,012 INFO org.apache.zookeeper.ClientCnxn: EventThread
> shut down
> 2012-04-26 06:14:20,013 INFO org.apache.zookeeper.ZooKeeper: Session:
> 0x136eb113fe40005 closed
> 2012-04-26 06:14:20,013 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: stopping server
> slave2,60020,1335383643169; zookeeper connection closed.
> 2012-04-26 06:14:20,013 DEBUG
> org.apache.hadoop.hbase.regionserver.CompactSplitThread: Waiting for Split
> Thread to finish...
> 2012-04-26 06:14:20,013 DEBUG
> org.apache.hadoop.hbase.regionserver.CompactSplitThread: Waiting for Large
> Compaction Thread to finish...
> 2012-04-26 06:14:20,013 DEBUG
> org.apache.hadoop.hbase.regionserver.CompactSplitThread: Waiting for Small
> Compaction Thread to finish...
> 2012-04-26 06:14:20,013 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: regionserver60020
> exiting
> 2012-04-26 06:14:20,013 INFO
> org.apache.hadoop.hbase.regionserver.ShutdownHook: Starting fs shutdown
> hook thread.
> 2012-04-26 06:14:20,014 INFO
> org.apache.hadoop.hbase.regionserver.ShutdownHook: Shutdown hook finished.
> Thu Apr 26 14:11:33 EDT 2012 Starting regionserver on slave2
> core file size          (blocks, -c) 0
> data seg size           (kbytes, -d) unlimited
> scheduling priority             (-e) 0
> file size               (blocks, -f) unlimited
> pending signals                 (-i) 127381
> max locked memory       (kbytes, -l) 64
> max memory size         (kbytes, -m) unlimited
> open files                      (-n) 32768
> pipe size            (512 bytes, -p) 8
> POSIX message queues     (bytes, -q) 819200
> real-time priority              (-r) 0
> stack size              (kbytes, -s) 10240
> cpu time               (seconds, -t) unlimited
> max user processes              (-u) 1024
> virtual memory          (kbytes, -v) unlimited
> file locks                      (-x) unlimited
> Thu Apr 26 14:13:11 EDT 2012 Starting regionserver on slave2
> core file size          (blocks, -c) 0
> data seg size           (kbytes, -d) unlimited
> scheduling priority             (-e) 0
> file size               (blocks, -f) unlimited
> pending signals                 (-i) 127381
> max locked memory       (kbytes, -l) 64
> max memory size         (kbytes, -m) unlimited
> open files                      (-n) 32768
> pipe size            (512 bytes, -p) 8
> POSIX message queues     (bytes, -q) 819200
> real-time priority              (-r) 0
> stack size              (kbytes, -s) 10240
> cpu time               (seconds, -t) unlimited
> max user processes              (-u) 1024
> virtual memory          (kbytes, -v) unlimited
> file locks                      (-x) unlimited
> Mon Apr 30 12:44:36 EDT 2012 Killing regionserver
> Mon Apr 30 12:55:01 EDT 2012 Starting regionserver on slave2
> core file size          (blocks, -c) 0
> data seg size           (kbytes, -d) unlimited
>
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message