aurora-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Santhosh Kumar Shanmugham <santhoshkuma...@gmail.com>
Subject Re: Review Request 51874: Change framework_name default value from 'TwitterScheduler' to 'Aurora'
Date Thu, 15 Sep 2016 00:33:20 GMT


> On Sept. 14, 2016, 3:48 p.m., Maxim Khutornenko wrote:
> > src/main/java/org/apache/aurora/scheduler/mesos/CommandLineDriverSettingsModule.java,
line 82
> > <https://reviews.apache.org/r/51874/diff/5/?file=1498686#file1498686line82>
> >
> >     Did you try to rollback to pre 0.15 scheduler while changing the framework name?
Trying to see if we can drop this 'backwards incompatible' statement now.

Tested "roll-forward" (to Aurora) and "roll-back" (via release and config change) (to TwitterScheduler)
on Aurora-0.14 (depends on Mesos-0.27.2) and Aurora-0.15(dependes on Mesos-0.28.2). The master
was able to re-register the framework with the same "id" and the running tasks were continuing
to make progress. (See details in testing section)

However I could not rollback the scheduler from 0.15 to 0.14 from source inside vagrant. Started
to on "aurorabuild all" complain with message,
"Could not satisfy all requirements for mesos.native==0.27.2"


- Santhosh Kumar


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51874/#review148988
-----------------------------------------------------------


On Sept. 14, 2016, 1:58 p.m., Santhosh Kumar Shanmugham wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51874/
> -----------------------------------------------------------
> 
> (Updated Sept. 14, 2016, 1:58 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen and Maxim Khutornenko.
> 
> 
> Bugs: AURORA-1688
>     https://issues.apache.org/jira/browse/AURORA-1688
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> Change framework_name default value from 'TwitterScheduler' to 'Aurora'
> 
> 
> Diffs
> -----
> 
>   RELEASE-NOTES.md ad2c68a6defe07c94480d7dee5b1496b50dc34e5 
>   src/main/java/org/apache/aurora/scheduler/mesos/CommandLineDriverSettingsModule.java
8a386bd208956eb0c8c2f48874b0c6fb3af58872 
>   src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh 97677f24a50963178a123b420d7ac136e4fde3fe

> 
> Diff: https://reviews.apache.org/r/51874/diff/
> 
> 
> Testing
> -------
> 
> ./build-support/jenkins/build.sh
> ./src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh
> 
> Testing to make sure backward compatibility:
> 
> Case 1: Rolling forward does not impact running tasks:
> Renaming framework from 'TwitterScheduler' to 'Aurora':
> 
> The framework re-registers after restart (treated by master as failover) and gets the
same framework-id. Running task remain unaffected.
> 
> Master log:
> I0914 16:48:28.408182  9815 master.cpp:1297] Giving framework 071c44a1-b4d4-4339-a727-03a79f725851-0000
(TwitterScheduler) at scheduler-75517c8f-5913-49e9-8cc4-342a78c9bbcb@192.168.33.7:8083 3weeks
to failover
> I0914 16:48:28.408226  9815 hierarchical.cpp:382] Deactivated framework 071c44a1-b4d4-4339-a727-03a79f725851-0000
> E0914 16:48:28.408617  9819 process.cpp:2105] Failed to shutdown socket with fd 28: Transport
endpoint is not connected
> I0914 16:48:43.722126  9813 master.cpp:2424] Received SUBSCRIBE call for framework 'Aurora'
at scheduler-dfad8309-de4b-47d8-a8f8-82828ea40a12@192.168.33.7:8083
> I0914 16:48:43.722190  9813 master.cpp:2500] Subscribing framework Aurora with checkpointing
enabled and capabilities [ REVOCABLE_RESOURCES, GPU_RESOURCES ]
> I0914 16:48:43.722225  9813 master.cpp:2564] Updating info for framework 071c44a1-b4d4-4339-a727-03a79f725851-0000
> I0914 16:48:43.722256  9813 master.cpp:2577] Framework 071c44a1-b4d4-4339-a727-03a79f725851-0000
(Aurora) at scheduler-75517c8f-5913-49e9-8cc4-342a78c9bbcb@192.168.33.7:8083 failed over
> I0914 16:48:43.722429  9813 hierarchical.cpp:348] Activated framework 071c44a1-b4d4-4339-a727-03a79f725851-0000
> I0914 16:48:43.722595  9813 master.cpp:5709] Sending 1 offers to framework 071c44a1-b4d4-4339-a727-03a79f725851-0000
(Aurora) at scheduler-dfad8309-de4b-47d8-a8f8-82828ea40a12@192.168.33.7:8083
> 
> Scheduler log:
> I0914 16:48:44.157 [Thread-10, MesosSchedulerImpl:151] Registered with ID value: "071c44a1-b4d4-4339-a727-03a79f725851-0000"
> , master: id: "461b98b8-63e1-40e3-96fd-cb62420945ae"
> ip: 119646400
> port: 5050
> pid: "master@192.168.33.7:5050"
> hostname: "aurora.local"
> version: "1.0.0"
> address {
>   hostname: "aurora.local"
>   ip: "192.168.33.7"
>   port: 5050
> }
> 
> Case 2: Rolling backward does not impact running tasks:
> Rolling back framework name from 'Aurora' to 'TwitterScheduler':
> 
> The framework re-registers after restart (treated by master as failover) and gets the
same framework-id. Running task remain unaffected.
> 
> Master log:
> I0914 16:51:33.203495  9812 master.cpp:1297] Giving framework 071c44a1-b4d4-4339-a727-03a79f725851-0000
(Aurora) at scheduler-dfad8309-de4b-47d8-a8f8-82828ea40a12@192.168.33.7:8083 3weeks to failover
> I0914 16:51:33.203526  9812 hierarchical.cpp:382] Deactivated framework 071c44a1-b4d4-4339-a727-03a79f725851-0000
> I0914 16:51:49.614074  9813 master.cpp:2424] Received SUBSCRIBE call for framework 'TwitterScheduler'
at scheduler-6fa8b819-aed9-42e1-9c6c-3e4be2f62500@192.168.33.7:8083
> I0914 16:51:49.614215  9813 master.cpp:2500] Subscribing framework TwitterScheduler with
checkpointing enabled and capabilities [ REVOCABLE_RESOURCES, GPU_RESOURCES ]
> I0914 16:51:49.614312  9813 master.cpp:2564] Updating info for framework 071c44a1-b4d4-4339-a727-03a79f725851-0000
> I0914 16:51:49.614359  9813 master.cpp:2577] Framework 071c44a1-b4d4-4339-a727-03a79f725851-0000
(TwitterScheduler) at scheduler-dfad8309-de4b-47d8-a8f8-82828ea40a12@192.168.33.7:8083 failed
over
> I0914 16:51:49.614977  9813 hierarchical.cpp:348] Activated framework 071c44a1-b4d4-4339-a727-03a79f725851-0000
> I0914 16:51:49.615170  9813 master.cpp:5709] Sending 1 offers to framework 071c44a1-b4d4-4339-a727-03a79f725851-0000
(TwitterScheduler) at scheduler-6fa8b819-aed9-42e1-9c6c-3e4be2f62500@192.168.33.7:8083
> 
> Scheduler log:
> I0914 16:51:50.249 [Thread-10, MesosSchedulerImpl:151] Registered with ID value: "071c44a1-b4d4-4339-a727-03a79f725851-0000"
> , master: id: "461b98b8-63e1-40e3-96fd-cb62420945ae"
> ip: 119646400
> port: 5050
> pid: "master@192.168.33.7:5050"
> hostname: "aurora.local"
> version: "1.0.0"
> address {
>   hostname: "aurora.local"
>   ip: "192.168.33.7"
>   port: 5050
> }
> 
> Case 3: Restarting with old framework_name (rolling back config) does not impact running
tasks:
> Restarting the scheduler after updating the config from 'Aurora' to 'TwitterScheduler':
> 
> Rename takes effect. The master re-registered the framework to the same id. Running task
remain unaffected.
> 
> Master log:
> I0914 20:34:58.059640 28176 master.cpp:1297] Giving framework 071c44a1-b4d4-4339-a727-03a79f725851-0000
(Aurora) at scheduler-4a7c21b7-5d90-4218-936e-4142051b3444@192.168.33.7:8083 3weeks to failover
> I0914 20:34:58.059675 28176 hierarchical.cpp:382] Deactivated framework 071c44a1-b4d4-4339-a727-03a79f725851-0000
> I0914 20:35:23.447479 28175 master.cpp:2424] Received SUBSCRIBE call for framework 'TwitterScheduler'
at scheduler-cea31751-7cb5-46b2-8208-f9ab1d4fe86c@192.168.33.7:8083
> I0914 20:35:23.447573 28175 master.cpp:2500] Subscribing framework TwitterScheduler with
checkpointing enabled and capabilities [ REVOCABLE_RESOURCES, GPU_RESOURCES ]
> I0914 20:35:23.447592 28175 master.cpp:2564] Updating info for framework 071c44a1-b4d4-4339-a727-03a79f725851-0000
> I0914 20:35:23.447615 28175 master.cpp:2577] Framework 071c44a1-b4d4-4339-a727-03a79f725851-0000
(TwitterScheduler) at scheduler-4a7c21b7-5d90-4218-936e-4142051b3444@192.168.33.7:8083 failed
over
> I0914 20:35:23.447777 28175 hierarchical.cpp:348] Activated framework 071c44a1-b4d4-4339-a727-03a79f725851-0000
> I0914 20:35:23.447968 28175 master.cpp:5709] Sending 1 offers to framework 071c44a1-b4d4-4339-a727-03a79f725851-0000
(TwitterScheduler) at scheduler-cea31751-7cb5-46b2-8208-f9ab1d4fe86c@192.168.33.7:8083
> 
> Scheduler log:
> I0914 20:35:24.000 [Thread-10, MesosSchedulerImpl:151] Registered with ID value: "071c44a1-b4d4-4339-a727-03a79f725851-0000
> "
> , master: id: "848618fb-714d-4b00-ad80-950f6bdc70c6"
> ip: 119646400
> port: 5050
> pid: "master@192.168.33.7:5050"
> hostname: "aurora.local"
> version: "1.0.0"
> address {
>   hostname: "aurora.local"
>   ip: "192.168.33.7"
>   port: 5050
> }
> 
> 
> Thanks,
> 
> Santhosh Kumar Shanmugham
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message