ambari-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Myroslav Papirkovskyi (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (AMBARI-15714) Express Upgrade hung after FAILED step is retried
Date Tue, 05 Apr 2016 13:21:25 GMT

     [ https://issues.apache.org/jira/browse/AMBARI-15714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Myroslav Papirkovskyi updated AMBARI-15714:
-------------------------------------------
    Status: Patch Available  (was: Open)

> Express Upgrade hung after FAILED step is retried
> -------------------------------------------------
>
>                 Key: AMBARI-15714
>                 URL: https://issues.apache.org/jira/browse/AMBARI-15714
>             Project: Ambari
>          Issue Type: Bug
>          Components: ambari-server
>    Affects Versions: 2.2.2
>            Reporter: Myroslav Papirkovskyi
>            Assignee: Myroslav Papirkovskyi
>            Priority: Critical
>             Fix For: 2.2.2
>
>         Attachments: AMBARI-15714.patch
>
>
> *Steps:*
> # Start Express Upgrade from HDP 2.4.0.0 to 2.4.2.0-130
> # Reach till backup the Hive Metastore message and hit Proceed
> # Stop Ambari agent on one of the HBase RegionServer host (os-r7-kwjvku-ambari-eu-4-4.novalocal
on the current cluster)
> # Wait for EU to report failure (status as HOLDING_TIMEDOUT)
> # Start ambari-agent on the RS host and wait 60 secs. for the heartbeat to be operational
> # Retry the failed step in EU wizard
>  
> *Result*
> EU hangs
> ambari-server logs report below:
> {code}
> 05 Apr 2016 08:09:26,526  WARN [ambari-hearbeat-monitor] HeartbeatMonitor:157 - Heartbeat
lost from host os-r7-kwjvku-ambari-eu-4-4.novalocal
> 05 Apr 2016 08:09:26,563  WARN [ambari-hearbeat-monitor] HeartbeatMonitor:172 - Setting
component state to UNKNOWN for component SECONDARY_NAMENODE on os-r7-kwjvku-ambari-eu-4-4.novalocal
> 05 Apr 2016 08:09:26,566  WARN [ambari-hearbeat-monitor] HeartbeatMonitor:172 - Setting
component state to UNKNOWN for component HISTORYSERVER on os-r7-kwjvku-ambari-eu-4-4.novalocal
> 05 Apr 2016 08:09:26,568  WARN [ambari-hearbeat-monitor] HeartbeatMonitor:172 - Setting
component state to UNKNOWN for component HIVE_METASTORE on os-r7-kwjvku-ambari-eu-4-4.novalocal
> 05 Apr 2016 08:09:26,571  WARN [ambari-hearbeat-monitor] HeartbeatMonitor:172 - Setting
component state to UNKNOWN for component WEBHCAT_SERVER on os-r7-kwjvku-ambari-eu-4-4.novalocal
> 05 Apr 2016 08:09:26,574  WARN [ambari-hearbeat-monitor] HeartbeatMonitor:172 - Setting
component state to UNKNOWN for component HIVE_SERVER on os-r7-kwjvku-ambari-eu-4-4.novalocal
> 05 Apr 2016 08:09:26,577  WARN [ambari-hearbeat-monitor] HeartbeatMonitor:172 - Setting
component state to UNKNOWN for component OOZIE_SERVER on os-r7-kwjvku-ambari-eu-4-4.novalocal
> 05 Apr 2016 08:09:26,580  WARN [ambari-hearbeat-monitor] HeartbeatMonitor:172 - Setting
component state to UNKNOWN for component ZOOKEEPER_SERVER on os-r7-kwjvku-ambari-eu-4-4.novalocal
> 05 Apr 2016 08:09:26,583  WARN [ambari-hearbeat-monitor] HeartbeatMonitor:172 - Setting
component state to UNKNOWN for component DRPC_SERVER on os-r7-kwjvku-ambari-eu-4-4.novalocal
> 05 Apr 2016 08:09:26,585  WARN [ambari-hearbeat-monitor] HeartbeatMonitor:172 - Setting
component state to UNKNOWN for component KAFKA_BROKER on os-r7-kwjvku-ambari-eu-4-4.novalocal
> 05 Apr 2016 08:09:26,589  WARN [ambari-hearbeat-monitor] HeartbeatMonitor:172 - Setting
component state to UNKNOWN for component SPARK_JOBHISTORYSERVER on os-r7-kwjvku-ambari-eu-4-4.novalocal
> 05 Apr 2016 08:09:26,592  WARN [ambari-hearbeat-monitor] HeartbeatMonitor:172 - Setting
component state to UNKNOWN for component METRICS_GRAFANA on os-r7-kwjvku-ambari-eu-4-4.novalocal
> 05 Apr 2016 08:09:26,595  WARN [ambari-hearbeat-monitor] HeartbeatMonitor:172 - Setting
component state to UNKNOWN for component DATANODE on os-r7-kwjvku-ambari-eu-4-4.novalocal
> 05 Apr 2016 08:09:26,597  WARN [ambari-hearbeat-monitor] HeartbeatMonitor:172 - Setting
component state to UNKNOWN for component NFS_GATEWAY on os-r7-kwjvku-ambari-eu-4-4.novalocal
> 05 Apr 2016 08:09:26,600  WARN [ambari-hearbeat-monitor] HeartbeatMonitor:172 - Setting
component state to UNKNOWN for component NODEMANAGER on os-r7-kwjvku-ambari-eu-4-4.novalocal
> 05 Apr 2016 08:09:26,603  WARN [ambari-hearbeat-monitor] HeartbeatMonitor:172 - Setting
component state to UNKNOWN for component HBASE_REGIONSERVER on os-r7-kwjvku-ambari-eu-4-4.novalocal
> 05 Apr 2016 08:09:26,606  WARN [ambari-hearbeat-monitor] HeartbeatMonitor:172 - Setting
component state to UNKNOWN for component SUPERVISOR on os-r7-kwjvku-ambari-eu-4-4.novalocal
> 05 Apr 2016 08:09:26,609  WARN [ambari-hearbeat-monitor] HeartbeatMonitor:172 - Setting
component state to UNKNOWN for component FLUME_HANDLER on os-r7-kwjvku-ambari-eu-4-4.novalocal
> 05 Apr 2016 08:09:26,611  WARN [ambari-hearbeat-monitor] HeartbeatMonitor:172 - Setting
component state to UNKNOWN for component SPARK_THRIFTSERVER on os-r7-kwjvku-ambari-eu-4-4.novalocal
> 05 Apr 2016 08:09:26,615  WARN [ambari-hearbeat-monitor] HeartbeatMonitor:172 - Setting
component state to UNKNOWN for component METRICS_MONITOR on os-r7-kwjvku-ambari-eu-4-4.novalocal
> 05 Apr 2016 08:09:26,618  WARN [ambari-hearbeat-monitor] HeartbeatMonitor:172 - Setting
component state to UNKNOWN for component HST_AGENT on os-r7-kwjvku-ambari-eu-4-4.novalocal
> 05 Apr 2016 08:09:27,571  INFO [ambari-action-scheduler] ActionScheduler:702 - Host:os-r7-kwjvku-ambari-eu-4-4.novalocal,
role:HBASE_REGIONSERVER, actionId:13-29 timed out
> 05 Apr 2016 08:09:28,750  INFO [ambari-action-scheduler] ActionScheduler:702 - Host:os-r7-kwjvku-ambari-eu-4-4.novalocal,
role:HBASE_REGIONSERVER, actionId:13-29 timed out
> 05 Apr 2016 08:09:28,751  WARN [ambari-action-scheduler] ActionScheduler:704 - Host:os-r7-kwjvku-ambari-eu-4-4.novalocal,
role:HBASE_REGIONSERVER, actionId:13-29 expired
> 05 Apr 2016 08:09:28,789 ERROR [ambari-action-scheduler] ServiceComponentHostImpl:1030
- Can't handle ServiceComponentHostEvent event at current state, serviceComponentName=HBASE_REGIONSERVER,
hostName=os-r7-kwjvku-ambari-eu-4-4.novalocal, currentState=UNKNOWN, eventType=HOST_SVCCOMP_OP_FAILED,
event=EventType: HOST_SVCCOMP_OP_FAILED
> 05 Apr 2016 08:09:28,789  WARN [ambari-action-scheduler] ActionScheduler:806 - Unable
to transition to failed state.
> org.apache.ambari.server.state.fsm.InvalidStateTransitionException: Invalid event: HOST_SVCCOMP_OP_FAILED
at UNKNOWN
>         at org.apache.ambari.server.state.fsm.StateMachineFactory.doTransition(StateMachineFactory.java:297)
>         at org.apache.ambari.server.state.fsm.StateMachineFactory.access$300(StateMachineFactory.java:39)
>         at org.apache.ambari.server.state.fsm.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:440)
>         at org.apache.ambari.server.state.svccomphost.ServiceComponentHostImpl.handleEvent(ServiceComponentHostImpl.java:1025)
>         at org.apache.ambari.server.actionmanager.ActionScheduler.transitionToFailedState(ActionScheduler.java:789)
>         at org.apache.ambari.server.actionmanager.ActionScheduler.processInProgressStage(ActionScheduler.java:710)
>         at org.apache.ambari.server.actionmanager.ActionScheduler.doWork(ActionScheduler.java:289)
>         at org.apache.ambari.server.actionmanager.ActionScheduler.run(ActionScheduler.java:196)
>         at java.lang.Thread.run(Thread.java:745)
> 05 Apr 2016 08:09:28,790  INFO [ambari-action-scheduler] ActionScheduler:717 - Removing
command from queue, host=os-r7-kwjvku-ambari-eu-4-4.novalocal, commandId=13-29
> 05 Apr 2016 08:09:59,571  WARN [qtp-ambari-agent-418] SecurityFilter:103 - Request https://os-r7-kwjvku-ambari-eu-4-5.novalocal:8440/ca
doesn't match any pattern.
> 05 Apr 2016 08:09:59,571  WARN [qtp-ambari-agent-418] SecurityFilter:62 - This request
is not allowed on this port: https://os-r7-kwjvku-ambari-eu-4-5.novalocal:8440/ca
> 5 Apr 2016 08:10:00,761  INFO [qtp-ambari-agent-418] HeartBeatHandler:400 - agentOsType
= centos7
> 05 Apr 2016 08:10:00,983  INFO [qtp-ambari-agent-418] HostImpl:285 - Received host registration,
host=[hostname=os-r7-kwjvku-ambari-eu-4-4,fqdn=os-r7-kwjvku-ambari-eu-4-4.novalocal,domain=novalocal,architecture=x86_64,processorcount=2,physicalprocessorcount=2,osname=centos,osversion=7.0.1406,osfamily=redhat,memory=16269820,uptime_hours=2,mounts=(available=24033216,mountpoint=/,used=2165564,percent=9%,size=26198780,device=/dev/vda1,type=xfs)(available=8119336,mountpoint=/dev,used=0,percent=0%,size=8119336,device=devtmpfs,type=devtmpfs)(available=8134900,mountpoint=/dev/shm,used=8,percent=1%,size=8134908,device=tmpfs,type=tmpfs)(available=8093468,mountpoint=/run,used=41440,percent=1%,size=8134908,device=tmpfs,type=tmpfs)(available=234937192,mountpoint=/grid/0,used=9839132,percent=5%,size=257899908,device=/dev/vdb,type=ext4)]
> , registrationTime=1459843800761, agentVersion=2.2.2.0
> 05 Apr 2016 08:10:00,983  INFO [qtp-ambari-agent-418] TopologyManager:316 - TopologyManager.onHostRegistered:
Entering
> 05 Apr 2016 08:10:00,984  INFO [qtp-ambari-agent-418] TopologyManager:318 - TopologyManager.onHostRegistered:
host = os-r7-kwjvku-ambari-eu-4-4.novalocal is already associated with the cluster or is currently
being processed
> 05 Apr 2016 08:10:01,127  INFO [qtp-ambari-agent-418] HeartBeatHandler:467 - Recovery
configuration set to RecoveryConfig{, type=AUTO_START, maxCount=6, windowInMinutes=60, retryGap=5,
maxLifetimeCount=1024, disabledComponents=, enabledComponents=METRICS_COLLECTOR}
> 05 Apr 2016 08:10:02,289  WARN [ambari-action-scheduler] ActionScheduler:695 - Detected
ambari-agent restart during command execution.The command has been aborted.Execution command
details: host: os-r7-kwjvku-ambari-eu-4-4.novalocal, role: HBASE_REGIONSERVER, actionId: 13-29
> 05 Apr 2016 08:10:03,489  WARN [ambari-action-scheduler] ActionScheduler:695 - Detected
ambari-agent restart during command execution.The command has been aborted.Execution command
details: host: os-r7-kwjvku-ambari-eu-4-4.novalocal, role: HBASE_REGIONSERVER, actionId: 13-29
> 05 Apr 2016 08:10:04,685  WARN [ambari-action-scheduler] ActionScheduler:695 - Detected
ambari-agent restart during command execution.The command has been aborted.Execution command
details: host: os-r7-kwjvku-ambari-eu-4-4.novalocal, role: HBASE_REGIONSERVER, actionId: 13-29
> 05 Apr 2016 08:10:05,911  WARN [ambari-action-scheduler] ActionScheduler:695 - Detected
ambari-agent restart during command execution.The command has been aborted.Execution command
details: host: os-r7-kwjvku-ambari-eu-4-4.novalocal, role: HBASE_REGIONSERVER, actionId: 13-29
> 05 Apr 2016 08:10:07,124  WARN [ambari-action-scheduler] ActionScheduler:695 - Detected
ambari-agent restart during command execution.The command has been aborted.Execution command
details: host: os-r7-kwjvku-ambari-eu-4-4.novalocal, role: HBASE_REGIONSERVER, actionId: 13-29
> 05 Apr 2016 08:10:08,336  WARN [ambari-action-scheduler] ActionScheduler:695 - Detected
ambari-agent restart during command execution.The command has been aborted.Execution command
details: host: os-r7-kwjvku-ambari-eu-4-4.novalocal, role: HBASE_REGIONSERVER, actionId: 13-29
> 05 Apr 2016 08:10:09,570  WARN [ambari-action-scheduler] ActionScheduler:695 - Detected
ambari-agent restart during command execution.The command has been aborted.Execution command
details: host: os-r7-kwjvku-ambari-eu-4-4.novalocal, role: HBASE_REGIONSERVER, actionId: 13-29
> 05 Apr 2016 08:10:10,857  WARN [ambari-action-scheduler] ActionScheduler:695 - Detected
ambari-agent restart during command execution.The command has been aborted.Execution command
details: host: os-r7-kwjvku-ambari-eu-4-4.novalocal, role: HBASE_REGIONSERVER, actionId: 13-29
> 05 Apr 2016 08:10:12,066  WARN [ambari-action-scheduler] ActionScheduler:695 - Detected
ambari-agent restart during command execution.The command has been aborted.Execution command
details: host: os-r7-kwjvku-ambari-eu-4-4.novalocal, role: HBASE_REGIONSERVER, actionId: 13-29
> 05 Apr 2016 08:10:13,295  WARN [ambari-action-scheduler] ActionScheduler:695 - Detected
ambari-agent restart during command execution.The command has been aborted.Execution command
details: host: os-r7-kwjvku-ambari-eu-4-4.novalocal, role: HBASE_REGIONSERVER, actionId: 13-29
> 05 Apr 2016 08:10:14,284  INFO [ambari-heartbeat-processor-0] HeartbeatProcessor:605
- State of service component HISTORYSERVER of service MAPREDUCE2 of cluster cl1 has changed
from UNKNOWN to INSTALLED at host os-r7-kwjvku-ambari-eu-4-4.novalocal according to STATUS_COMMAND
report
> 05 Apr 2016 08:10:14,290  INFO [ambari-heartbeat-processor-0] HeartbeatProcessor:605
- State of service component WEBHCAT_SERVER of service HIVE of cluster cl1 has changed from
UNKNOWN to INSTALLED at host os-r7-kwjvku-ambari-eu-4-4.novalocal according to STATUS_COMMAND
report
> 05 Apr 2016 08:10:14,294  INFO [ambari-heartbeat-processor-0] HeartbeatProcessor:605
- State of service component KAFKA_BROKER of service KAFKA of cluster cl1 has changed from
UNKNOWN to INSTALLED at host os-r7-kwjvku-ambari-eu-4-4.novalocal according to STATUS_COMMAND
report
> 05 Apr 2016 08:10:14,298  INFO [ambari-heartbeat-processor-0] HeartbeatProcessor:605
- State of service component SPARK_JOBHISTORYSERVER of service SPARK of cluster cl1 has changed
from UNKNOWN to INSTALLED at host os-r7-kwjvku-ambari-eu-4-4.novalocal according to STATUS_COMMAND
report
> 05 Apr 2016 08:10:14,303  INFO [ambari-heartbeat-processor-0] HeartbeatProcessor:605
- State of service component METRICS_GRAFANA of service AMBARI_METRICS of cluster cl1 has
changed from UNKNOWN to STARTED at host os-r7-kwjvku-ambari-eu-4-4.novalocal according to
STATUS_COMMAND report
> 05 Apr 2016 08:10:14,307  INFO [ambari-heartbeat-processor-0] HeartbeatProcessor:605
- State of service component DATANODE of service HDFS of cluster cl1 has changed from UNKNOWN
to STARTED at host os-r7-kwjvku-ambari-eu-4-4.novalocal according to STATUS_COMMAND report
> 05 Apr 2016 08:10:14,313  INFO [ambari-heartbeat-processor-0] HeartbeatProcessor:605
- State of service component NFS_GATEWAY of service HDFS of cluster cl1 has changed from UNKNOWN
to STARTED at host os-r7-kwjvku-ambari-eu-4-4.novalocal according to STATUS_COMMAND report
> 05 Apr 2016 08:10:14,316  INFO [ambari-heartbeat-processor-0] HeartbeatProcessor:605
- State of service component HBASE_REGIONSERVER of service HBASE of cluster cl1 has changed
from UNKNOWN to STARTED at host os-r7-kwjvku-ambari-eu-4-4.novalocal according to STATUS_COMMAND
report
> 05 Apr 2016 08:10:14,321  INFO [ambari-heartbeat-processor-0] HeartbeatProcessor:605
- State of service component SUPERVISOR of service STORM of cluster cl1 has changed from UNKNOWN
to INSTALLED at host os-r7-kwjvku-ambari-eu-4-4.novalocal according to STATUS_COMMAND report
> 05 Apr 2016 08:10:14,325  INFO [ambari-heartbeat-processor-0] HeartbeatProcessor:605
- State of service component OOZIE_SERVER of service OOZIE of cluster cl1 has changed from
UNKNOWN to INSTALLED at host os-r7-kwjvku-ambari-eu-4-4.novalocal according to STATUS_COMMAND
report
> 05 Apr 2016 08:10:14,329  INFO [ambari-heartbeat-processor-0] HeartbeatProcessor:605
- State of service component METRICS_MONITOR of service AMBARI_METRICS of cluster cl1 has
changed from UNKNOWN to STARTED at host os-r7-kwjvku-ambari-eu-4-4.novalocal according to
STATUS_COMMAND report
> 05 Apr 2016 08:10:14,333  INFO [ambari-heartbeat-processor-0] HeartbeatProcessor:605
- State of service component NODEMANAGER of service YARN of cluster cl1 has changed from UNKNOWN
to INSTALLED at host os-r7-kwjvku-ambari-eu-4-4.novalocal according to STATUS_COMMAND report
> 05 Apr 2016 08:10:14,337  INFO [ambari-heartbeat-processor-0] HeartbeatProcessor:605
- State of service component HIVE_METASTORE of service HIVE of cluster cl1 has changed from
UNKNOWN to INSTALLED at host os-r7-kwjvku-ambari-eu-4-4.novalocal according to STATUS_COMMAND
report
> 05 Apr 2016 08:10:14,343  INFO [ambari-heartbeat-processor-0] HeartbeatProcessor:605
- State of service component HIVE_SERVER of service HIVE of cluster cl1 has changed from UNKNOWN
to INSTALLED at host os-r7-kwjvku-ambari-eu-4-4.novalocal according to STATUS_COMMAND report
> 05 Apr 2016 08:10:14,348  INFO [ambari-heartbeat-processor-0] HeartbeatProcessor:605
- State of service component ZOOKEEPER_SERVER of service ZOOKEEPER of cluster cl1 has changed
from UNKNOWN to STARTED at host os-r7-kwjvku-ambari-eu-4-4.novalocal according to STATUS_COMMAND
report
> 05 Apr 2016 08:10:14,351  INFO [ambari-heartbeat-processor-0] HeartbeatProcessor:605
- State of service component SPARK_THRIFTSERVER of service SPARK of cluster cl1 has changed
from UNKNOWN to INSTALLED at host os-r7-kwjvku-ambari-eu-4-4.novalocal according to STATUS_COMMAND
report
> 05 Apr 2016 08:10:14,354  INFO [ambari-heartbeat-processor-0] HeartbeatProcessor:605
- State of service component DRPC_SERVER of service STORM of cluster cl1 has changed from
UNKNOWN to INSTALLED at host os-r7-kwjvku-ambari-eu-4-4.novalocal according to STATUS_COMMAND
report
> 05 Apr 2016 08:10:14,359  INFO [ambari-heartbeat-processor-0] HeartbeatProcessor:605
- State of service component SECONDARY_NAMENODE of service HDFS of cluster cl1 has changed
from UNKNOWN to STARTED at host os-r7-kwjvku-ambari-eu-4-4.novalocal according to STATUS_COMMAND
report
> 05 Apr 2016 08:10:14,363  INFO [ambari-heartbeat-processor-0] HeartbeatProcessor:605
- State of service component FLUME_HANDLER of service FLUME of cluster cl1 has changed from
UNKNOWN to INSTALLED at host os-r7-kwjvku-ambari-eu-4-4.novalocal according to STATUS_COMMAND
report
> 05 Apr 2016 08:10:14,366  INFO [ambari-heartbeat-processor-0] HeartbeatProcessor:605
- State of service component HST_AGENT of service SMARTSENSE of cluster cl1 has changed from
UNKNOWN to STARTED at host os-r7-kwjvku-ambari-eu-4-4.novalocal according to STATUS_COMMAND
report
> 05 Apr 2016 08:10:14,499  WARN [ambari-action-scheduler] ActionScheduler:695 - Detected
ambari-agent restart during command execution.The command has been aborted.Execution command
details: host: os-r7-kwjvku-ambari-eu-4-4.novalocal, role: HBASE_REGIONSERVER, actionId: 13-29
> 05 Apr 2016 08:10:15,706  WARN [ambari-action-scheduler] ActionScheduler:695 - Detected
ambari-agent restart during command execution.The command has been aborted.Execution command
details: host: os-r7-kwjvku-ambari-eu-4-4.novalocal, role: HBASE_REGIONSERVER, actionId: 13-29
> 05 Apr 2016 08:10:16,894  WARN [ambari-action-scheduler] ActionScheduler:695 - Detected
ambari-agent restart during command execution.The command has been aborted.Execution command
details: host: os-r7-kwjvku-ambari-eu-4-4.novalocal, role: HBASE_REGIONSERVER, actionId: 13-29
> 05 Apr 2016 08:10:18,151  WARN [ambari-action-scheduler] ActionScheduler:695 - Detected
ambari-agent restart during command execution.The command has been aborted.Execution command
details: host: os-r7-kwjvku-ambari-eu-4-4.novalocal, role: HBASE_REGIONSERVER, actionId: 13-29
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message