openwhisk-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chetan Mehrotra <chetan.mehro...@gmail.com>
Subject Re: Recording metadata related to activation
Date Tue, 27 Aug 2019 10:01:45 GMT
Thanks for all the feedback

>From @ Dominic Kim

> One option can be storing them as parts of an activation for operators but
> exclude them when returning them in response to the user request.

Ack. Some of metadata are more for diagnostic purpose and may(
should?) not be exposed to end users. So any impl need to distinguish
between public and private metadata

>From @Erez Hadad

> Bottom line: I think this "meta" information needs to be more streamlined
> end-to-end, available to code during invocation and persisted post-factum
> in the activation record.

Adding support for other such "meta" information would be on a case by
case basis. So far TransactionId is only missing meta info which we
know beforehand and hence now pass that to action. Other meta info so
far discussed are generated after the actual invocation. So later we
find any meta info which system knows beforehand then we can add
support to pass that

>From @Tyson Norris

> I think a first step is to create separate meta dictionary on Activation (option 1) without
changing the API (use annotations) or runtimes. We can iterate on invoker/runtime coordination
to make passing this data more consistent, and change /init /run orchestration separately
as needed.

Ack. So any proposed change should only change the internal storage
format. To end user any such meta info (those which are generic like
TransactionId) should only be exposed via annotations

>From @Matt Rutkowski

> The approach that I have seen work elsewhere I refer to as "tagging", that is "tagging"
data (in this case activations) with domain-specific identifiers used to construct diff. views
for diff. domains.

I liked this idea. However my only concern here is converting this to
an array would prevent us from being selective in what meta info we
need to index. Have meta info as dictionary would provide finer
control on which meta info operator want to index. For e.g. I may only
want to index TransactionId but not the k8s PodId. Later (podId) being
only used for some diagnostic work

Given activation db is very large I would like to minimize any
overhead in terms of indexing of meta info. One can still index all
the dict keys if needed (Both Cosmos and Couch can index all keys
under a dict if needed).

Updated Proposal
==============

1. Enable the `ContainerResponse` to include a "meta" map. Any key
which starts with `_` like `_podId` would be considered private meta
key
2. Record all this meta info in the activation under "meta" key. This
can also be augmented with system considered meta key like
transactionId
3. When sending the Activation record to client
    - Remove the "meta" dict
    - Include all "public" meta key like `transactionId` as annotation entry

Chetan Mehrotra

On Wed, Aug 21, 2019 at 9:43 AM Matt Rutkowski <mrutkowski@apache.org> wrote:
>
> If we intend to add another top-level key to the data to make it more accessible for
index/search, we should do so in a manner that is extensible for any number of IDs.  Index/search,
as well as security and business audits, require identifiers exclusively and this, in my view,
is different from general metadata which should be more descriptive and disposable.
>
> The approach that I have seen work elsewhere I refer to as "tagging", that is "tagging"
data (in this case activations) with domain-specific identifiers used to construct diff. views
for diff. domains.
>
> A single key is assoc. with a list of any number of these domain specific identifiers
each expressed as a URI where the URI components include a prefix/domain that identifies the
domain wherein the ID is unique (and consequently how to interpret the ID), optional paths
can be used to further describe the ID's unique space (resource or purpose) and end with the
actual ID.  URIs, aside from being self-descriptive for interpretation, are desirable as they
intrinsically avoid collisions and also do not require a key as the URI prefix/domain/path
uniquely identify the domain/purpose of the identifier within the same string.
>
> we could define any number if IDs that are recognized by the OW domain and event create
a resrved prefix to keep them short, e.g., :
>
> full: "//openwhisk.apache.org/transaction/<UID>"
> prefixed: "ow:transaction-<UID>"
>
> For example, let's say an activation handled credit card data, one could "tag" the record
with a PCi indicator:
>
> "//GRC20.gov/cloud/security/pci-dss/transaction/<UID>"
>
> these could appear on an optional key such as:
>
> {
>    "tags":[
>       "p1://d1/id1",
>       "p2://d2/id2",
>       ...
>    ]
> }
>
> tags do not necessarily need to be for IDs alone... that is they can also help in aggregating
search data; for example, we could "tag" all data that was assigned to a certain region or
cluster using this method as well:
>
> {
>    "tags":[
>       "//ibmcloud.com/icf/region/us-south/cluster/0fdeg1"
>       "ow:cluster-kube-055b10f",
>       "ow:trans-0555ffca456919",
>       ...
>    ]
> }
>
> of course, the array could be limited in size and downstream processors (search or otherwise)
could easily "pick out" what tags they care about and discard ones they do not.
>
> On 2019/08/20 10:30:19, Chetan Mehrotra <chetan.mehrotra@gmail.com> wrote:
> > Hi Team,
> >
> > Branching the thread [1] to discuss how to record some metadata
> > related to activation. Based on some of the usecases I see a need to
> > record some more metadata related to activation. Some examples are
> >
> > 1. transactionId - Record the transactionId for which the activation is part of
> > 2. pod name - Records the pod running the action container when using
> > KubernetesContainerFactory
> > 3. invocationId - Some id returned by underlying system when
> > integrating with AWS Lambda or Azure Function
> > 4. clusterId - If running multiple clusters for same system we would
> > like to know which cluster handed the given execution
> >
> > Some of these ids are determined as part of `ContainerResponse` itself
> > and have to be made part of activation json such that later we can
> > correlate the activation with other parts.
> >
> > Now we need to determine how to store such id
> >
> > Option 1 - New "meta" sub document
> > -----------
> >
> > Introduce a new "meta" key in activation json under which we store such ids
> >
> > "meta" : {
> >             "transactionId" : "xxx",
> >             "podId" : "ow_xxx"
> >         }
> >
> >
> > Option 2 - Store them as annotations
> > -------------
> >
> > Instead of  introducing a new field we store them as annotations. Note
> > we still make change in code to capture such data as part of
> > `ContainerResponse` but just map it to annotations
> >
> > One drawback of this approach is that current approach of annotations
> > make it harder to index such fields easily. Having a flat structure
> > like with "meta" field enables indexing such fields in db's other than
> > Couch
> >
> > Chetan Mehrotra
> > [1]: https://lists.apache.org/thread.html/f8b73a9ffb0d09a50aecfb54538da2e8365c54dcc3e26a78382ad7bd@%3Cdev.openwhisk.apache.org%3E
> >

Mime
View raw message