ambari-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aravindan Vijayan" <avija...@hortonworks.com>
Subject Re: Review Request 42391: AMBARI-14704 : Restart storm fails with a metrics storm sink jar related error sometimes
Date Sun, 17 Jan 2016 00:55:41 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/42391/
-----------------------------------------------------------

(Updated Jan. 17, 2016, 12:55 a.m.)


Review request for Ambari, Alejandro Fernandez, Dmitro Lisnichenko, Dmytro Sen, Sumit Mohanty,
and Sid Wagle.


Changes
-------

Moved logic to separate method.


Bugs: AMBARI-14704
    https://issues.apache.org/jira/browse/AMBARI-14704


Repository: ambari


Description
-------

Error trace

Traceback (most recent call last):
  File "/var/lib/ambari-agent/cache/common-services/STORM/0.9.1.2.1/package/scripts/drpc_server.py",
line 130, in <module>
    DrpcServer().execute()
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py",
line 219, in execute
    method(env)
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py",
line 524, in restart
    self.start(env, upgrade_type=upgrade_type)
  File "/var/lib/ambari-agent/cache/common-services/STORM/0.9.1.2.1/package/scripts/drpc_server.py",
line 62, in start
    self.configure(env)
  File "/var/lib/ambari-agent/cache/common-services/STORM/0.9.1.2.1/package/scripts/drpc_server.py",
line 49, in configure
    storm()
  File "/usr/lib/python2.6/site-packages/ambari_commons/os_family_impl.py", line 89, in thunk
    return fn(*args, **kwargs)
  File "/var/lib/ambari-agent/cache/common-services/STORM/0.9.1.2.1/package/scripts/storm.py",
line 105, in storm
    only_if=format("ls {metric_collector_sink_jar}")
  File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 154, in __init__
    self.env.run()
  File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 158,
in run
    self.run_action(resource, action)
  File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 121,
in run_action
    provider_action()
  File "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", line
238, in action_run
    tries=self.resource.tries, try_sleep=self.resource.try_sleep)
  File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 70, in inner
    result = function(command, **kwargs)
  File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 92, in checked_call
    tries=tries, try_sleep=try_sleep)
  File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 140, in
_call_wrapper
    result = _call(command, **kwargs_copy)
  File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 291, in
_call
    raise Fail(err_msg)
resource_management.core.exceptions.Fail: Execution of 'ambari-sudo.sh ln -s /usr/lib/storm/lib/ambari-metrics-storm-sink*.jar
/usr/hdp/current/storm-client/lib/ambari-metrics-storm-sink.jar' returned 1. ln: failed to
create symbolic link '/usr/hdp/current/storm-client/lib/ambari-metrics-storm-sink.jar': File
exists


PROBLEM
During a storm component restart, we remove a symlink of a metrics sink jar in /usr/hdp/current/storm-client/lib
and create a new symlink pointing to the new metrics jar version.
When 2 (or more) storm-client components are present on the same host, during a restart there
could be a race condition rarely, where one component could create a symlink between the Delete
and Create symlink calls of the other component. Thus the Create symlink would fail for the
other component, thus causing Start/Restart to fail.


FIX
Move the symlink creation and deletion logic to storm-ui-server start script since that is
the only component that needs the metrics reporter jar.


Diffs (updated)
-----

  ambari-server/src/main/resources/common-services/STORM/0.9.1.2.1/package/scripts/storm.py
7000861 
  ambari-server/src/main/resources/common-services/STORM/0.9.1.2.1/package/scripts/ui_server.py
42f12fc 
  ambari-server/src/test/python/stacks/2.1/STORM/test_storm_ui_server.py 128b53f 

Diff: https://reviews.apache.org/r/42391/diff/


Testing
-------

Manual testing done.

ambari-server python unit tests pass.

Submitted patch through apache.


Thanks,

Aravindan Vijayan


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message