ambari-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sebastian Toader <stoa...@hortonworks.com>
Subject Re: Review Request 43948: RM fails to start: IOException: /ats/active does not exist
Date Thu, 25 Feb 2016 13:14:44 GMT


> On Feb. 24, 2016, 6 p.m., Andrew Onischuk wrote:
> > ambari-server/src/main/resources/common-services/YARN/2.1.0.2.0/package/scripts/resourcemanager.py,
line 234
> > <https://reviews.apache.org/r/43948/diff/1/?file=1267791#file1267791line234>
> >
> >     Would this be better to move this to libraries/functions, in case we will need
this in other services?
> >     
> >     Also might be better to name it waitForHdfsDirectoryCreated, rather than checkHdfsDir,
so it's easier to understand what function does.

Renamed the function as suggested. I haven't spent time on making function generic enough
for libraries/functions as wanted to get this bug fixed and comitted quickly. If this function
could be used in other places as well than I think we can raise a separate jira for this make
it generic (maybe extend HdfsResource class to provide this functionallity).


- Sebastian


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/43948/#review120519
-----------------------------------------------------------


On Feb. 24, 2016, 5:39 p.m., Sebastian Toader wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/43948/
> -----------------------------------------------------------
> 
> (Updated Feb. 24, 2016, 5:39 p.m.)
> 
> 
> Review request for Ambari, Alejandro Fernandez, Andrew Onischuk, Sumit Mohanty, and Sid
Wagle.
> 
> 
> Bugs: AMBARI-15158
>     https://issues.apache.org/jira/browse/AMBARI-15158
> 
> 
> Repository: ambari
> 
> 
> Description
> -------
> 
> If ATS is installed than Resource Manager after starting will check if the directories
where ATS will store time line data for active and completed applications exists in DFS. There
migh tbe cases when RM comes up much earlier than ATS creating these directories. In these
situations RM will stop with "IOException: /ats/active does not exist" error message.
> 
> In order to avoid this situation the pythin script responsible for starting RM component
has been modified to check the existence of these directories upfront before the RM process
is started. This check is performed only if ATS is installed and have either yarn.timeline-service.entity-group-fs-store.active-dir
or yarn.timeline-service.entity-group-fs-store.done-dir set.
> 
> 
> Diffs
> -----
> 
>   ambari-server/src/main/resources/common-services/YARN/2.1.0.2.0/package/scripts/params_linux.py
2ef404d 
>   ambari-server/src/main/resources/common-services/YARN/2.1.0.2.0/package/scripts/resourcemanager.py
ec7799e 
> 
> Diff: https://reviews.apache.org/r/43948/diff/
> 
> 
> Testing
> -------
> 
> Manual testing:
> 1. Created secure/non-secure clusters with Blueprint where NN, RM and ATS were deployed
to different nodes. This was tested with both cases when HDFS has webhdfs enabled and disabled.
> 2. Created a cluster using the UI where NN, RM and ATS were deployed to different nodes.
After the cluster was kerberized and was tested with both cases when HDFS has webhdfs enabled
and disabled.
> 
> Python tests results:
> ----------------------------------------------------------------------
> Total run:902
> Total errors:0
> Total failures:0
> OK
> 
> 
> Thanks,
> 
> Sebastian Toader
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message