ambari-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sumit Mohanty" <smoha...@hortonworks.com>
Subject Re: Review Request 42391: AMBARI-14704 : Restart storm fails with a metrics storm sink jar related error sometimes
Date Sat, 16 Jan 2016 20:37:18 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/42391/#review114878
-----------------------------------------------------------

Ship it!


Its OK to refactor the code later.

- Sumit Mohanty


On Jan. 16, 2016, 7:02 p.m., Aravindan Vijayan wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/42391/
> -----------------------------------------------------------
> 
> (Updated Jan. 16, 2016, 7:02 p.m.)
> 
> 
> Review request for Ambari, Alejandro Fernandez, Dmitro Lisnichenko, Dmytro Sen, Sumit
Mohanty, and Sid Wagle.
> 
> 
> Bugs: AMBARI-14704
>     https://issues.apache.org/jira/browse/AMBARI-14704
> 
> 
> Repository: ambari
> 
> 
> Description
> -------
> 
> Error trace
> 
> Traceback (most recent call last):
>   File "/var/lib/ambari-agent/cache/common-services/STORM/0.9.1.2.1/package/scripts/drpc_server.py",
line 130, in <module>
>     DrpcServer().execute()
>   File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py",
line 219, in execute
>     method(env)
>   File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py",
line 524, in restart
>     self.start(env, upgrade_type=upgrade_type)
>   File "/var/lib/ambari-agent/cache/common-services/STORM/0.9.1.2.1/package/scripts/drpc_server.py",
line 62, in start
>     self.configure(env)
>   File "/var/lib/ambari-agent/cache/common-services/STORM/0.9.1.2.1/package/scripts/drpc_server.py",
line 49, in configure
>     storm()
>   File "/usr/lib/python2.6/site-packages/ambari_commons/os_family_impl.py", line 89,
in thunk
>     return fn(*args, **kwargs)
>   File "/var/lib/ambari-agent/cache/common-services/STORM/0.9.1.2.1/package/scripts/storm.py",
line 105, in storm
>     only_if=format("ls {metric_collector_sink_jar}")
>   File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 154,
in __init__
>     self.env.run()
>   File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line
158, in run
>     self.run_action(resource, action)
>   File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line
121, in run_action
>     provider_action()
>   File "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py",
line 238, in action_run
>     tries=self.resource.tries, try_sleep=self.resource.try_sleep)
>   File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 70,
in inner
>     result = function(command, **kwargs)
>   File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 92,
in checked_call
>     tries=tries, try_sleep=try_sleep)
>   File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 140,
in _call_wrapper
>     result = _call(command, **kwargs_copy)
>   File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 291,
in _call
>     raise Fail(err_msg)
> resource_management.core.exceptions.Fail: Execution of 'ambari-sudo.sh ln -s /usr/lib/storm/lib/ambari-metrics-storm-sink*.jar
/usr/hdp/current/storm-client/lib/ambari-metrics-storm-sink.jar' returned 1. ln: failed to
create symbolic link '/usr/hdp/current/storm-client/lib/ambari-metrics-storm-sink.jar': File
exists
> 
> 
> PROBLEM
> During a storm component restart, we remove a symlink of a metrics sink jar in /usr/hdp/current/storm-client/lib
and create a new symlink pointing to the new metrics jar version.
> When 2 (or more) storm-client components are present on the same host, during a restart
there could be a race condition rarely, where one component could create a symlink between
the Delete and Create symlink calls of the other component. Thus the Create symlink would
fail for the other component, thus causing Start/Restart to fail.
> 
> 
> FIX
> Move the symlink creation and deletion logic to storm-ui-server start script since that
is the only component that needs the metrics reporter jar.
> 
> 
> Diffs
> -----
> 
>   ambari-server/src/main/resources/common-services/STORM/0.9.1.2.1/package/scripts/storm.py
7000861 
>   ambari-server/src/main/resources/common-services/STORM/0.9.1.2.1/package/scripts/ui_server.py
42f12fc 
>   ambari-server/src/test/python/stacks/2.1/STORM/test_storm_ui_server.py 128b53f 
> 
> Diff: https://reviews.apache.org/r/42391/diff/
> 
> 
> Testing
> -------
> 
> Manual testing done.
> 
> ambari-server python unit tests pass.
> 
> Submitted patch through apache.
> 
> 
> Thanks,
> 
> Aravindan Vijayan
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message