openwhisk-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tyson Norris <tnor...@adobe.com.INVALID>
Subject Re: Invoker HA on Mesos
Date Tue, 03 Apr 2018 16:00:09 GMT
One problem with this (delegating to ContainerFactory to share prewarm/warm containers to other
cluster nodes) is that ContainerFactory currently is previously ignorant of container state
- and making use of the shared containers requires sharing at least some of their state (besides
paused/running state). Specifically:
- creating a prewarm, the kind needs to be shared
- pausing a warm, the action needs to be shared

To handle this, the ContainerFactory.createContainer(), Container.suspend() and Container.resume()
would have to change to propagate this state.

This seems slightly awkward to me, so want to put it out for feedback. WDYT?



On Mar 30, 2018, at 2:31 PM, David P Grove <groved@us.ibm.com<mailto:groved@us.ibm.com>>
wrote:


+1.  I like this design.

--dave

Tyson Norris <tnorris@adobe.com.INVALID<mailto:tnorris@adobe.com.INVALID>> wrote
on 03/30/2018 01:37:43 PM:

From: Tyson Norris <tnorris@adobe.com.INVALID<mailto:tnorris@adobe.com.INVALID>>
To: "dev@openwhisk.apache.org<mailto:dev@openwhisk.apache.org>" <dev@openwhisk.apache.org<mailto:dev@openwhisk.apache.org>>
Date: 03/30/2018 01:37 PM
Subject: Re: Invoker HA on Mesos

Hooking into pause/unpause/destroy of containers seems plausible,
instead of hooking into the Maps in ContainerPool.

So in the existing PR, the ContainerPool uses an alternate impl for
Map to store freePool and prewarmPool, and that alternate impl
initiates the attach to existing containers, when it becomes active.

The ContainerPool could instead potentially delegate to the
ContainerFactory, e.g. a
ContainerFactory.reviveContainers(childFactory) => (freePool,
prewarmPool) - we will still need a way to trigger this on demand
(e.g. when the standby pool becomes active, in our case, but I think
that is a minor detail).

I can try it out; I will be out next week, but if you test any of
this in the meantime, let me know.

Thanks
Tyson


On Mar 30, 2018, at 9:58 AM, David P Grove <groved@us.ibm.com<mailto:groved@us.ibm.com>>
wrote:


Tyson Norris <tnorris@adobe.com.INVALID<mailto:tnorris@adobe.com.INVALID>> wrote
on 03/27/2018 06:25:59
PM:

Do you have an example of the labels working? I guess the labels are
changed over time through the lifecycle of the container?


Apologies for brutally chopping the email chain; my mail client made a
horrible hash of it.

Right now, all we are doing with Kube labels is to label each action
container with its owning invoker on startup.  This lets us delete
orphaned
containers if the invoker crashes and needs to be restarted.  The
labeling
happens at [1] and the removal of orphans using the labels at [2].

I think the Kube-native version of part of what you are doing with the
DistributedData for Mesos would be to add and remove additional labels
to
give us the option of attaching a new invoker instance to orphaned
containers instead of just destroying them.   Interacting with the
Kubernetes API server to do a labeling operation takes around 10ms, so
we
couldn't do this on a truly hot path.  But we could probably afford to
update container labels in parallel with pause/unpause operations,
which
could enable re-attachment to any paused containers.

--dave

[1]
https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl&data=02%7C01%7Ctnorris%40adobe.com%7Ca7a6bc14ead944405aad08d59685d4e4%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636580423906584912&sdata=heMhgQgGqt4ku4hDZuAbKRDw96xQkM7anxlvlhoShs0%3D&reserved=0?

u=https-3A__na01.safelinks.protection.outlook.com_-3Furl-3Dhttps-253A-252F-252Fgithub.com<http://3furl-3dhttps-253a-252f-252fgithub.com/>-252Fapache-252Fincubator-2Dopenwhisk-252Fblob-252F0b20df0f725a671f8e51c9e8793116476fd22f76-252Fcore-252Finvoker-252Fsrc-252Fmain-252Fscala-252Fwhisk-252Fcore-252Fcontainerpool-252Fkubernetes-252FKubernetesContainerFactory.scala-2523L81-26data-3D02-257C01-257Ctnorris-2540adobe.com<http://252fcore-252finvoker-252fsrc-252fmain-252fscala-252fwhisk-252fcore-252fcontainerpool-252fkubernetes-252fkubernetescontainerfactory.scala-2523l81-26data-3d02-257c01-257ctnorris-2540adobe.com/>-257C3ea96a8a416141db52b208d59660052f-257Cfa7b1b5a7b34438794aed2c178decee1-257C0-257C0-257C636580261502275400-26sdata-3D6XagwDT7CnCoj1nOIHK-252B02bincKYogLkKy0vUXh8jY8-253D-26reserved-3D0&d=DwIFAg&c=jf_iaSHvJObTbx-

siA1ZOg&r=Fe4FicGBU_20P2yihxV-

apaNSFb6BSj6AlkptSF2gMk&m=4UxWSqFWfs8nhAEogipIZa9x4X7JbRZ5gLfuemvqWQI&s=AiIYyNqL1l96RBLRXVhvdAaIkrJjdZ-

GRKClR0esbDc&e=
[2]
https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl&data=02%7C01%7Ctnorris%40adobe.com%7Ca7a6bc14ead944405aad08d59685d4e4%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636580423906584912&sdata=heMhgQgGqt4ku4hDZuAbKRDw96xQkM7anxlvlhoShs0%3D&reserved=0?

u=https-3A__na01.safelinks.protection.outlook.com_-3Furl-3Dhttps-253A-252F-252Fgithub.com<http://3furl-3dhttps-253a-252f-252fgithub.com/>-252Fapache-252Fincubator-2Dopenwhisk-252Fblob-252F0b20df0f725a671f8e51c9e8793116476fd22f76-252Fcore-252Finvoker-252Fsrc-252Fmain-252Fscala-252Fwhisk-252Fcore-252Fcontainerpool-252Fkubernetes-252FKubernetesContainerFactory.scala-2523L57-26data-3D02-257C01-257Ctnorris-2540adobe.com<http://252fcore-252finvoker-252fsrc-252fmain-252fscala-252fwhisk-252fcore-252fcontainerpool-252fkubernetes-252fkubernetescontainerfactory.scala-2523l57-26data-3d02-257c01-257ctnorris-2540adobe.com/>-257C3ea96a8a416141db52b208d59660052f-257Cfa7b1b5a7b34438794aed2c178decee1-257C0-257C0-257C636580261502275400-26sdata-3Df6VQl9UMW7gtoFheibT9opXz973hGUVmivlDJg-252FF5Co-253D-26reserved-3D0&d=DwIFAg&c=jf_iaSHvJObTbx-

siA1ZOg&r=Fe4FicGBU_20P2yihxV-

apaNSFb6BSj6AlkptSF2gMk&m=4UxWSqFWfs8nhAEogipIZa9x4X7JbRZ5gLfuemvqWQI&s=ISliBvpYptlv9AhbicWZSFptIleHy1-

XzCcKuqP7e-0&e=


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message