ambari-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sebastian Toader <stoa...@hortonworks.com>
Subject Re: Review Request 43948: RM fails to start: IOException: /ats/active does not exist
Date Thu, 25 Feb 2016 13:14:49 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/43948/
-----------------------------------------------------------

(Updated Feb. 25, 2016, 2:14 p.m.)


Review request for Ambari, Alejandro Fernandez, Andrew Onischuk, Sumit Mohanty, and Sid Wagle.


Changes
-------

1. Skip directories listed in /var/lib/ambari-agent/data/.hdfs_resource_ignore
2. Optimize code so as kinit is invoked lesser times


Bugs: AMBARI-15158
    https://issues.apache.org/jira/browse/AMBARI-15158


Repository: ambari


Description
-------

If ATS is installed than Resource Manager after starting will check if the directories where
ATS will store time line data for active and completed applications exists in DFS. There migh
tbe cases when RM comes up much earlier than ATS creating these directories. In these situations
RM will stop with "IOException: /ats/active does not exist" error message.

In order to avoid this situation the pythin script responsible for starting RM component has
been modified to check the existence of these directories upfront before the RM process is
started. This check is performed only if ATS is installed and have either yarn.timeline-service.entity-group-fs-store.active-dir
or yarn.timeline-service.entity-group-fs-store.done-dir set.


Diffs (updated)
-----

  ambari-common/src/main/python/resource_management/libraries/providers/hdfs_resource.py b73ae56

  ambari-server/src/main/resources/common-services/YARN/2.1.0.2.0/package/scripts/params_linux.py
2ef404d 
  ambari-server/src/main/resources/common-services/YARN/2.1.0.2.0/package/scripts/resourcemanager.py
ec7799e 

Diff: https://reviews.apache.org/r/43948/diff/


Testing
-------

Manual testing:
1. Created secure/non-secure clusters with Blueprint where NN, RM and ATS were deployed to
different nodes. This was tested with both cases when HDFS has webhdfs enabled and disabled.
2. Created a cluster using the UI where NN, RM and ATS were deployed to different nodes. After
the cluster was kerberized and was tested with both cases when HDFS has webhdfs enabled and
disabled.

Python tests results:
----------------------------------------------------------------------
Total run:902
Total errors:0
Total failures:0
OK


Thanks,

Sebastian Toader


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message