hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Loughran (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-530) Define Service model strictly, implement AbstractService for robust subclassing, migrate yarn-common services
Date Wed, 15 May 2013 22:59:16 GMT

    [ https://issues.apache.org/jira/browse/YARN-530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13658961#comment-13658961

Steve Loughran commented on YARN-530:

h3. Service

bq. {{start()}} doesn't use {{enterState()}} API, so we don't capture the life-cycle change


bq. {{init()}}, {{start()}} and {{stop()}} aren't synchronized, so callers of {{getServiceState()}}
will get incorrect information if, let's say, {{innerInit}} is still in progress.

This is interesting. I've realised they need to be sync or you add an "stateChangeInProgress"
marker to ensure you eliminate the race of someone trying to {{stop()}} a service half-way
through {{start()}}. Some of the more complex state models make the 'starting', 'stopping'
states explicit, which is another option. 

* I'm going to make the methods {{synchronized}} to stop the risk of any race conditions -while
doing a best effort at keeping the notifications outside the {{synchronized}} block. The corner
case here is that when an init or start fails and {{stop()}} is called automatically: it will
notify its listeners inside the {{stop()}} call.

* I'm going to keep the state queries unsynced (reading a volatile), so that doesn't stop
things that only want to read and not manipulate service state from blocking.

bq. Rename {{inState}} to {{isInState}}?


bq. {{waitForServiceToStop}} seems redundant because we also have listeners, right? Sure one
is a blocking call while the other is async. I'd remove it unless there is some other strong
reason. May be we can implement an async utility using{{ getServiceState()}} and implement
a generic {{waitForServiceState}} instead of just for stop?

-let me add something that isn't on the critical path for the next alpha, as it isn't an API
change: an entry point to start a service by its name.
I've just added YARN-679 to show the use case here: an entry point to start any service by
its classname. That isn't ready to be committed (where are the tests!), but it shows the vision.
I'll see if I can implement it with the notifications.

bq. What is the use of 'blockers'?

An attempt to make the reason a service blocks visible, at least when it is consciously blocking
(e.g. spin/sleep waiting for manager node). Unconscious blocking, by way of blocking API calls,
will still happen. If the service can declare that they are blocked by something, then other
tooling can say "what is this service waiting for"

bq. May be LifecycleEvent move to top-level?

bq. Not part of your patch, but we may as well take this opportunity to fix this: Rename register
-> registerServiceListener? Similarly unregister.

h3. AbstractService

bq. The inner* method-names don't look good when using the service stuff. Shall we rename
it to to say, for e.g, innerInit to initService ?

bq. Mark all the super init, start, stop methods as final?

Gladly, though it does cause a mockito test to fail:

 Time elapsed: 247 sec  <<< FAILURE!
java.lang.AssertionError: null state in null class org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker$$EnhancerByMockitoWithCGLIB$$221b8e7
	at org.apache.hadoop.yarn.service.AbstractService.enterState(AbstractService.java:431)
	at org.apache.hadoop.yarn.service.AbstractService.init(AbstractService.java:151)
	at org.apache.hadoop.yarn.service.CompositeService.serviceInit(CompositeService.java:67)
	at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.serviceInit(ResourceLocalizationService.java:240)
	at org.apache.hadoop.yarn.service.AbstractService.init(AbstractService.java:154)
	at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testResourceRelease(TestResourceLocalizationService.java:239)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
	at org.junit.runners.BlockJUnit4ClassRunner.runNotIgnored(BlockJUnit4ClassRunner.java:79)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:71)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:49)
	at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193)
	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52)
	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191)
	at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42)
	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184)
	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
	at org.junit.runners.ParentRunner.run(ParentRunner.java:236)
	at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:252)
	at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:141)
	at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:112)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:189)
	at org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:165)
	at org.apache.maven.surefire.booter.ProviderFactory.invokeProvider(ProviderFactory.java:85)
	at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:115)
	at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:75)
I don't see an obvious fix for this -need some help from a Mockito expert.

bq. Lets err out on null configuration in init().

bq. enterState(): Rename original to either oldState or previousState?

bq. stopQuietly, stopService wrappers are unnecessary?

h3. {{ServiceStateModel}}
bq. : init to init, start to start transitions should also be allowed - for reentrancy?

-I left out on the basis that if you are doing this then something may be wrong, but calling
stop without any state checks is absolutely critical to handle. 

If you think we should me make the others no-op too, it'll be trivial.

bq. {{ServiceStateModel}}: if {{Service.inState()}} is renamed, so should {{inState}} in this
class too.

bq. Rename 'original' to {{oldState}} or {{previousState}} and state to {currentState}?

done for {{original}}, the other is the name of the field -do you think I should rename that?

h3. {{ServiceOperarations}}

bq. Why the extra wrappers for init, start - those APIs already internally check if the transition
is valid or not right? Even so, stop doesn't have this 
extra transition-check while others have.
-removed, were originally there to handle the (more brittle) previous model.

bq. Little confused here: the tests indicate that on listener throwing exception doesn't disturb
any other listener or the service itself, but I can't understand this from the code - notifyListeners
doesn't do anything for listeners throwing exceptions.

That's left to whatever invokes the notification, here {{AbstractService}}, which downgrades
failures to warns. The tests verify that this happens.

bq. {{TestServiceOperations.testStopUnstarted}}: You have duplicate checks in there.
-pulled the entire test along with the operations tested

bq. {{TestCompositeService.testServiceLifecycleNoChildrenl}}: Typo on test-name

bq. Also, can you please fix the findbugs warnings? Tx

I'll see what the report says on the next run
> Define Service model strictly, implement AbstractService for robust subclassing, migrate
yarn-common services
> -------------------------------------------------------------------------------------------------------------
>                 Key: YARN-530
>                 URL: https://issues.apache.org/jira/browse/YARN-530
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>         Attachments: YARN-117changes.pdf, YARN-530-005.patch, YARN-530-2.patch, YARN-530-3.patch,
YARN-530.4.patch, YARN-530.patch
> # Extend the YARN {{Service}} interface as discussed in YARN-117
> # Implement the changes in {{AbstractService}} and {{FilterService}}.
> # Migrate all services in yarn-common to the more robust service model, test.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message