hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: Health Script does not stop region server
Date Sat, 04 Feb 2017 15:30:10 GMT
Running the command from the script locally (on Mac):

$ /usr/bin/snmpwalk -t 5 -Oe  -Oq  -Os -v 1 -c public localhost if
Timeout: No Response from localhost
$ echo $?
1

Looks like the script should parse the output from snmpwalk and provide
some hint if unexpected result is reported.

Cheers

On Sat, Feb 4, 2017 at 6:40 AM, Lars George <lars.george@gmail.com> wrote:

> Hi,
>
> I tried the supplied `healthcheck.sh`, but did not have snmpd running.
> That caused the script to take a long time to error out, which exceed
> the 10 seconds the check was meant to run. That resets the check and
> it keeps reporting the error, but never stops the servers:
>
> 2017-02-04 05:55:08,962 INFO
> [regionserver/slave-1.internal.larsgeorge.com/10.0.10.10:16020]
> hbase.HealthCheckChore: Health Check Chore runs every 10sec
> 2017-02-04 05:55:08,975 INFO
> [regionserver/slave-1.internal.larsgeorge.com/10.0.10.10:16020]
> hbase.HealthChecker: HealthChecker initialized with script at
> /opt/hbase/bin/healthcheck.sh, timeout=60000
>
> ...
>
> 2017-02-04 05:55:50,435 INFO
> [slave-1.internal.larsgeorge.com,16020,1486216506007_ChoreService_1]
> hbase.HealthCheckChore: Health status at 412837hrs, 55mins, 50sec :
> ERROR check link, OK: disks ok,
>
> 2017-02-04 05:55:50,436 INFO
> [slave-1.internal.larsgeorge.com,16020,1486216506007_ChoreService_1]
> hbase.ScheduledChore: Chore: CompactionChecker missed its start time
> 2017-02-04 05:55:50,437 INFO
> [slave-1.internal.larsgeorge.com,16020,1486216506007_ChoreService_1]
> hbase.ScheduledChore: Chore:
> slave-1.internal.larsgeorge.com,16020,1486216506007-MemstoreFlusherChore
> missed its start time
> 2017-02-04 05:55:50,438 INFO
> [slave-1.internal.larsgeorge.com,16020,1486216506007_ChoreService_2]
> hbase.ScheduledChore: Chore: HealthChecker missed its start time
> 2017-02-04 05:56:20,522 INFO
> [slave-1.internal.larsgeorge.com,16020,1486216506007_ChoreService_2]
> hbase.HealthCheckChore: Health status at 412837hrs, 56mins, 20sec :
> ERROR check link, OK: disks ok,
>
> 2017-02-04 05:56:20,523 INFO
> [slave-1.internal.larsgeorge.com,16020,1486216506007_ChoreService_2]
> hbase.ScheduledChore: Chore: HealthChecker missed its start time
> 2017-02-04 05:56:50,600 INFO
> [slave-1.internal.larsgeorge.com,16020,1486216506007_ChoreService_2]
> hbase.HealthCheckChore: Health status at 412837hrs, 56mins, 50sec :
> ERROR check link, OK: disks ok,
>
> 2017-02-04 05:56:50,600 INFO
> [slave-1.internal.larsgeorge.com,16020,1486216506007_ChoreService_2]
> hbase.ScheduledChore: Chore: HealthChecker missed its start time
> 2017-02-04 05:57:20,681 INFO
> [slave-1.internal.larsgeorge.com,16020,1486216506007_ChoreService_1]
> hbase.HealthCheckChore: Health status at 412837hrs, 57mins, 20sec :
> ERROR check link, OK: disks ok,
>
> 2017-02-04 05:57:20,681 INFO
> [slave-1.internal.larsgeorge.com,16020,1486216506007_ChoreService_1]
> hbase.ScheduledChore: Chore: HealthChecker missed its start time
> 2017-02-04 05:57:50,763 INFO
> [slave-1.internal.larsgeorge.com,16020,1486216506007_ChoreService_1]
> hbase.HealthCheckChore: Health status at 412837hrs, 57mins, 50sec :
> ERROR check link, OK: disks ok,
>
> 2017-02-04 05:57:50,764 INFO
> [slave-1.internal.larsgeorge.com,16020,1486216506007_ChoreService_1]
> hbase.ScheduledChore: Chore: HealthChecker missed its start time
> 2017-02-04 05:58:20,844 INFO
> [slave-1.internal.larsgeorge.com,16020,1486216506007_ChoreService_1]
> hbase.HealthCheckChore: Health status at 412837hrs, 58mins, 20sec :
> ERROR check link, OK: disks ok,
>
> 2017-02-04 05:58:20,844 INFO
> [slave-1.internal.larsgeorge.com,16020,1486216506007_ChoreService_1]
> hbase.ScheduledChore: Chore: HealthChecker missed its start time
> 2017-02-04 05:58:50,923 INFO
> [slave-1.internal.larsgeorge.com,16020,1486216506007_ChoreService_1]
> hbase.HealthCheckChore: Health status at 412837hrs, 58mins, 50sec :
> ERROR check link, OK: disks ok,
>
> 2017-02-04 05:58:50,923 INFO
> [slave-1.internal.larsgeorge.com,16020,1486216506007_ChoreService_1]
> hbase.ScheduledChore: Chore: HealthChecker missed its start time
> 2017-02-04 05:59:21,017 INFO
> [slave-1.internal.larsgeorge.com,16020,1486216506007_ChoreService_2]
> hbase.HealthCheckChore: Health status at 412837hrs, 59mins, 21sec :
> ERROR check link, OK: disks ok,
>
> 2017-02-04 05:59:21,018 INFO
> [slave-1.internal.larsgeorge.com,16020,1486216506007_ChoreService_2]
> hbase.ScheduledChore: Chore: HealthChecker missed its start time
>
> That seems like a bug, no?
>
> Lars
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message