ambari-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hudson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (AMBARI-16914) Ambari uses too small a window for region server shutdown
Date Tue, 14 Jun 2016 00:27:57 GMT

    [ https://issues.apache.org/jira/browse/AMBARI-16914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15328702#comment-15328702
] 

Hudson commented on AMBARI-16914:
---------------------------------

FAILURE: Integrated in Ambari-trunk-Commit #5070 (See [https://builds.apache.org/job/Ambari-trunk-Commit/5070/])
AMBARI-16914. Ambari uses too small a window for region server shutdown (aonishuk: [http://git-wip-us.apache.org/repos/asf?p=ambari.git&a=commit&h=e0c9dc1f41d47eca52b840456bf1e51763912a44])
* ambari-server/src/main/resources/common-services/HBASE/0.96.0.2.0/configuration/hbase-env.xml
* ambari-server/src/main/resources/common-services/AMBARI_METRICS/0.1.0/configuration/ams-hbase-env.xml


> Ambari uses too small a window for region server shutdown
> ---------------------------------------------------------
>
>                 Key: AMBARI-16914
>                 URL: https://issues.apache.org/jira/browse/AMBARI-16914
>             Project: Ambari
>          Issue Type: Bug
>          Components: ambari-web
>    Affects Versions: 2.2.1
>            Reporter: Shankar Venkataraman
>         Attachments: AMBARI-16914.patch
>
>
> Ambari seems to issue a formal shutdown to a Region server but quickly (30 seconds) 
follows it up with SIGKILL. On a full loaded HBase system with about 200 regions per region
server and active transaction flow, there is no way a RS can stop in 30 seconds. This has
caused many issues in production including a memstore corruption. Why not use the shutdown
script that comes with HBase?
> 2016-05-24 15:36:19,191 - Execute['/usr/hdp/current/hbase-regionserver/bin/hbase-daemon.sh
--config /usr/hdp/current/hbase-regionserver/conf stop regionserver'] {'only_if': 'ambari-sudo.sh
 -H -E test -f /var/run/hbase/hbase-hbase-regionserver.pid && ps -p `ambari-sudo.sh
 -H -E cat /var/run/hbase/hbase-hbase-regionserver.pid` >/dev/null 2>&1', 'on_timeout':
'! ( ambari-sudo.sh  -H -E test -f /var/run/hbase/hbase-hbase-regionserver.pid &&
ps -p `ambari-sudo.sh  -H -E cat /var/run/hbase/hbase-hbase-regionserver.pid` >/dev/null
2>&1 ) || ambari-sudo.sh -H -E kill -9 `ambari-sudo.sh  -H -E cat /var/run/hbase/hbase-hbase-regionserver.pid`',
'timeout': 30, 'user': 'hbase'}
> 2016-05-24 15:36:50,982 - Executing '! ( ambari-sudo.sh  -H -E test -f /var/run/hbase/hbase-hbase-regionserver.pid
&& ps -p `ambari-sudo.sh  -H -E cat /var/run/hbase/hbase-hbase-regionserver.pid` >/dev/null
2>&1 ) || ambari-sudo.sh -H -E kill -9 `ambari-sudo.sh  -H -E cat /var/run/hbase/hbase-hbase-regionserver.pid`'.
Reason: Execution of 'ambari-sudo.sh su hbase -l -s /bin/bash -c 'export  PATH='"'"'/usr/sbin:/sbin:/usr/lib/ambari-server/*:/sbin:/usr/sbin:/bin:/usr/bin:/var/lib/ambari-agent'"'"'
; /usr/hdp/current/hbase-regionserver/bin/hbase-daemon.sh --config /usr/hdp/current/hbase-regionserver/conf
stop regionserver'' was killed due timeout after 30 seconds
> 2016-05-24 15:36:51,053 - File['/var/run/hbase/hbase-hbase-regionserver.pid'] {'action':
['delete']}
> 2016-05-24 15:36:51,054 - Deleting File['/var/run/hbase/hbase-hbase-regionserver.pid'



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message