ambari-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dmitro Lisnichenko" <dlysniche...@hortonworks.com>
Subject Re: Review Request 40600: Service or component install fails when a non-ambari apt-get command is running
Date Mon, 23 Nov 2015 15:29:15 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/40600/#review107598
-----------------------------------------------------------

Ship it!


Ship It!

- Dmitro Lisnichenko


On Nov. 23, 2015, 5:23 p.m., Andrew Onischuk wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/40600/
> -----------------------------------------------------------
> 
> (Updated Nov. 23, 2015, 5:23 p.m.)
> 
> 
> Review request for Ambari and Dmitro Lisnichenko.
> 
> 
> Bugs: AMBARI-14017
>     https://issues.apache.org/jira/browse/AMBARI-14017
> 
> 
> Repository: ambari
> 
> 
> Description
> -------
> 
> PROBLEM  
> User runs "apt-get check" via
> a cron job on their servers to check for broken dependencies. They report this
> command may take up to two minutes to complete on various nodes in their
> cluster. This command locks the package database via a write lock on
> /var/lib/dpkg/lock. During that interval, if Ambari is commanded to install a
> new component or perform other maintenance tasks on a cluster node that
> require access to the package database, the command will fail. Since the apt-
> get check is cron, apparently with some frequency, this represents a problem
> for ongoing maintenance, especially in large clusters.
> 
> It would be desirable if ambari and/or the agent were more fault tolerant of
> locks on the package database.
> 
> The stack trace at failure follows  
> Traceback (most recent call last):  
> File "/var/lib/ambari-agent/cache/stacks/HDP/2.0.6/hooks/before-
> INSTALL/scripts/hook.py", line 37, in <module>  
> BeforeInstallHook().execute()  
> File "/usr/lib/python2.6/site-
> packages/resource_management/libraries/script/script.py", line 219, in execute  
> method(env)  
> File "/var/lib/ambari-agent/cache/stacks/HDP/2.0.6/hooks/before-
> INSTALL/scripts/hook.py", line 33, in hook  
> install_repos()  
> File "/var/lib/ambari-agent/cache/stacks/HDP/2.0.6/hooks/before-
> INSTALL/scripts/repo_initialization.py", line 59, in install_repos  
> _alter_repo("create", params.repo_info, template)  
> File "/var/lib/ambari-agent/cache/stacks/HDP/2.0.6/hooks/before-
> INSTALL/scripts/repo_initialization.py", line 50, in _alter_repo  
> components = ubuntu_components, # ubuntu specific  
> File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line
> 154, in __init__  
> self.env.run()  
> File "/usr/lib/python2.6/site-
> packages/resource_management/core/environment.py", line 152, in run  
> self.run_action(resource, action)  
> File "/usr/lib/python2.6/site-
> packages/resource_management/core/environment.py", line 118, in run_action  
> provider_action()  
> File "/usr/lib/python2.6/site-
> packages/resource_management/libraries/providers/repository.py", line 110, in
> action_create  
> retcode, out = checked_call(update_cmd_formatted, sudo=True, quiet=False)  
> File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py",
> line 70, in inner  
> result = function(command, **kwargs)  
> File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py",
> line 92, in checked_call  
> tries=tries, try_sleep=try_sleep)  
> File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py",
> line 140, in _call_wrapper  
> result = _call(command, **kwargs_copy)  
> File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py",
> line 291, in _call  
> raise Fail(err_msg)  
> resource_management.core.exceptions.Fail: Execution of 'apt-get update <del>qq
> -o Dir::Etc::sourcelist=sources.list.d/HDP.list -o
> Dir::Etc::sourceparts=</del> -o APT::Get::List-Cleanup=0' returned 100. W: GPG
> error: <http://public-repo-1.hortonworks.com> HDP InRelease: The following
> signatures couldn't be verified because the public key is not available:
> NO_PUBKEY B9733A7A07513CAD  
> E: Could not get lock /var/lib/dpkg/lock - open (11: Resource temporarily
> unavailable)  
> E: Unable to lock the administration directory (/var/lib/dpkg/), is another
> process using it?
> 
> IMPACT  
> User will not manage their cluster with Ambari if this cannot be
> fixed by the end of November.
> 
> EXPECTED  
> Ambari retries installations for some period of time
> 
> ACTUAL  
> Ambari fails
> 
> ANALYSIS  
> I created a simple program based on the code at
> <http://beej.us/guide/bgipc/output/html/multipage/flocking.html> to write lock
> /var/lib/dpkg/lock on command, and then attempted a component install on a new
> node in a cluster. The install failed. After removing the lock, the
> installation succeeded. This is easily reproduced using a simple C program on
> a target node.
> 
> 
> Diffs
> -----
> 
>   ambari-agent/src/test/python/resource_management/TestPackageResource.py 18b2d00 
>   ambari-common/src/main/python/resource_management/core/providers/package/__init__.py
7e532bc 
>   ambari-common/src/main/python/resource_management/core/providers/package/apt.py ddd6952

>   ambari-common/src/main/python/resource_management/core/providers/package/zypper.py
3ff3dfd 
>   ambari-common/src/main/python/resource_management/core/resources/packaging.py 1ca88af

> 
> Diff: https://reviews.apache.org/r/40600/diff/
> 
> 
> Testing
> -------
> 
> mvn clean test
> 
> 
> Thanks,
> 
> Andrew Onischuk
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message