hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lars George <lars.geo...@gmail.com>
Subject Health Script does not stop region server
Date Sat, 04 Feb 2017 14:40:36 GMT
Hi,

I tried the supplied `healthcheck.sh`, but did not have snmpd running.
That caused the script to take a long time to error out, which exceed
the 10 seconds the check was meant to run. That resets the check and
it keeps reporting the error, but never stops the servers:

2017-02-04 05:55:08,962 INFO
[regionserver/slave-1.internal.larsgeorge.com/10.0.10.10:16020]
hbase.HealthCheckChore: Health Check Chore runs every 10sec
2017-02-04 05:55:08,975 INFO
[regionserver/slave-1.internal.larsgeorge.com/10.0.10.10:16020]
hbase.HealthChecker: HealthChecker initialized with script at
/opt/hbase/bin/healthcheck.sh, timeout=60000

...

2017-02-04 05:55:50,435 INFO
[slave-1.internal.larsgeorge.com,16020,1486216506007_ChoreService_1]
hbase.HealthCheckChore: Health status at 412837hrs, 55mins, 50sec :
ERROR check link, OK: disks ok,

2017-02-04 05:55:50,436 INFO
[slave-1.internal.larsgeorge.com,16020,1486216506007_ChoreService_1]
hbase.ScheduledChore: Chore: CompactionChecker missed its start time
2017-02-04 05:55:50,437 INFO
[slave-1.internal.larsgeorge.com,16020,1486216506007_ChoreService_1]
hbase.ScheduledChore: Chore:
slave-1.internal.larsgeorge.com,16020,1486216506007-MemstoreFlusherChore
missed its start time
2017-02-04 05:55:50,438 INFO
[slave-1.internal.larsgeorge.com,16020,1486216506007_ChoreService_2]
hbase.ScheduledChore: Chore: HealthChecker missed its start time
2017-02-04 05:56:20,522 INFO
[slave-1.internal.larsgeorge.com,16020,1486216506007_ChoreService_2]
hbase.HealthCheckChore: Health status at 412837hrs, 56mins, 20sec :
ERROR check link, OK: disks ok,

2017-02-04 05:56:20,523 INFO
[slave-1.internal.larsgeorge.com,16020,1486216506007_ChoreService_2]
hbase.ScheduledChore: Chore: HealthChecker missed its start time
2017-02-04 05:56:50,600 INFO
[slave-1.internal.larsgeorge.com,16020,1486216506007_ChoreService_2]
hbase.HealthCheckChore: Health status at 412837hrs, 56mins, 50sec :
ERROR check link, OK: disks ok,

2017-02-04 05:56:50,600 INFO
[slave-1.internal.larsgeorge.com,16020,1486216506007_ChoreService_2]
hbase.ScheduledChore: Chore: HealthChecker missed its start time
2017-02-04 05:57:20,681 INFO
[slave-1.internal.larsgeorge.com,16020,1486216506007_ChoreService_1]
hbase.HealthCheckChore: Health status at 412837hrs, 57mins, 20sec :
ERROR check link, OK: disks ok,

2017-02-04 05:57:20,681 INFO
[slave-1.internal.larsgeorge.com,16020,1486216506007_ChoreService_1]
hbase.ScheduledChore: Chore: HealthChecker missed its start time
2017-02-04 05:57:50,763 INFO
[slave-1.internal.larsgeorge.com,16020,1486216506007_ChoreService_1]
hbase.HealthCheckChore: Health status at 412837hrs, 57mins, 50sec :
ERROR check link, OK: disks ok,

2017-02-04 05:57:50,764 INFO
[slave-1.internal.larsgeorge.com,16020,1486216506007_ChoreService_1]
hbase.ScheduledChore: Chore: HealthChecker missed its start time
2017-02-04 05:58:20,844 INFO
[slave-1.internal.larsgeorge.com,16020,1486216506007_ChoreService_1]
hbase.HealthCheckChore: Health status at 412837hrs, 58mins, 20sec :
ERROR check link, OK: disks ok,

2017-02-04 05:58:20,844 INFO
[slave-1.internal.larsgeorge.com,16020,1486216506007_ChoreService_1]
hbase.ScheduledChore: Chore: HealthChecker missed its start time
2017-02-04 05:58:50,923 INFO
[slave-1.internal.larsgeorge.com,16020,1486216506007_ChoreService_1]
hbase.HealthCheckChore: Health status at 412837hrs, 58mins, 50sec :
ERROR check link, OK: disks ok,

2017-02-04 05:58:50,923 INFO
[slave-1.internal.larsgeorge.com,16020,1486216506007_ChoreService_1]
hbase.ScheduledChore: Chore: HealthChecker missed its start time
2017-02-04 05:59:21,017 INFO
[slave-1.internal.larsgeorge.com,16020,1486216506007_ChoreService_2]
hbase.HealthCheckChore: Health status at 412837hrs, 59mins, 21sec :
ERROR check link, OK: disks ok,

2017-02-04 05:59:21,018 INFO
[slave-1.internal.larsgeorge.com,16020,1486216506007_ChoreService_2]
hbase.ScheduledChore: Chore: HealthChecker missed its start time

That seems like a bug, no?

Lars

Mime
View raw message