openwhisk-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tyson Norris <tnor...@adobe.com.INVALID>
Subject Re: Proposal on a future architecture of OpenWhisk
Date Fri, 20 Jul 2018 16:24:07 GMT
On Logging, I think if you are considering enabling concurrent activation processing, you will
encounter that the only approach to parsing logs to be associated with a specific activationId,
is to force the log output to be structured, and always include the activationId with every
log message. This requires a change at the action container layer, but the simpler thing to
do is to encourage action containers to provide a structured logging context that action developers
can (and must) use to generate logs. 

An example is nodejs container - for the time being, we are hijacking the stdout/stderr and
injecting the activationId when any developer code writes to stdout/stderr (as console.log/console.error).
This may not work as simply in all action containers, and isn’t great even in nodejs. 

I would rather encourage action containers to provide a logging context, where action devs
use: log.info, log.debug, etc, and this logging context does the needful to assert some structure
to the log format. In general, many (most?) languages have conventions (slf4xyz, et al) for
this already, and while you lose “random writes to stdout”, I haven’t seen this be an
actual problem. 

If you don’t deal with interleaved logs (typically because activations don’t run concurrently),
this this is less of an issue, but regardless, writing log parsers is a solved problem that
would still be good to offload to external (not in OW controller/invoker) systems (logstash,
fluentd, splunk, etc). This obviously comes with a caveat that logs parsing will be delayed,
but that is OK from my point of view, partly because most logs will never be viewed, and partly
because the log ingest systems are mostly fast enough already to limit this delay to seconds
or milliseconds.  

Thanks
Tyson
> On Jul 20, 2018, at 8:46 AM, David P Grove <groved@us.ibm.com> wrote:
> 
> 
> Rethinking the architecture to more fully exploit the capabilities of the
> underlying container orchestration platforms is pretty exciting.  I think
> there are lots of interesting ideas to explore about how best to schedule
> the workload.
> 
> As brought out in the architecture proposal [1], although it is logically
> an orthogonal issue, improving the log processing for user containers is a
> key piece of this roadmap.  The initial experiences with the
> KubernetesContainerFactory indicate that post-facto log enrichment to add
> the activation id to each log line is a serious bottleneck.  It adds
> complexity to the system and measurably reduces system performance by
> delaying the re-use of action containers until the logs can be extracted
> and processing.
> 
> I believe what we really want is to be using an openwhisk-aware log driver
> that will dynamically inject the current activation id into every log line
> as soon as it is written.  Then the user container logs, already properly
> enriched when they are generated, can be fed directly into the platform
> logging system with no post-processing needed.
> 
> If the low-level container runtime is docker 17.09 or better, I think we
> could probably achieve this by writing a logging driver plugin [2] that
> extends docker's json logging driver.  For non-blackbox containers, I think
> we "just" need the /run method to update a shared location that is
> accessible to the logging driver plugin with the current activation id
> before it invokes the user code.  As log lines are produced, that location
> is read and the string with the activation id gets injected into the json
> formatted log line as it is produced.   For blackbox containers, we could
> have our dockerskeleton do the same thing, but the user would have to opt
> in somehow to the protocol if they were using their own action runner.
> Warning:  I haven't looked into how flushing works with these drivers, so
> I'm not sure that this really works....we need to make sure we don't enrich
> a log line with the wrong activation id because of delayed flushing.
> 
> If we're running on Kubernetes, we might decide that instead of using a
> logging driver plugin, to use a streaming sidecar container as shown in [3]
> and have the controller interact with the sidecar to update the current
> activation id (or have the sidecar read it from a shared memory location
> that is updated by /run to minimize the differences between deployment
> platforms).  I'm not sure this really works as well, since the sidecar
> might fall behind in processing the logs, so we might still need a
> handshake somewhere.
> 
> A third option would be to extend our current sentineled log design by also
> writing a "START_WHISK_ACTIVATION_LOG <ACTIVATION_ID>" line in the /run
> method before invoking the user code.  We'd still have to post-process the
> log files, but it could be decoupled from the critical path since the
> post-processor would have the activation id available to it in the log
> files (and thus would not need to handshake with the controller at all,
> thus we could offload all logging to a node-level log processing/forwarding
> agent).
> 
> Option 3 would be really easy to implement and is independent of the
> details of the low-level log driver, but doesn't eliminate the need to
> post-process the logs. It just makes it easier to move that processing off
> any critical path.
> 
> Thoughts?
> 
> --dave
> 
> [1] https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FOPENWHISK%2FOpenWhisk%2Bfuture&amp;data=02%7C01%7Ctnorris%40adobe.com%7C7c32fc01b3c44cfb7c3d08d5ee58a341%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636676986817234391&amp;sdata=3CAF0psQZ%2BTQCOuhSycTUaTYfcTgh3eIPlw1IDYvLWU%3D&amp;reserved=0
> +architecture
> [2] https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.docker.com%2Fv17.09%2Fengine%2Fadmin%2Flogging%2Fplugins%2F&amp;data=02%7C01%7Ctnorris%40adobe.com%7C7c32fc01b3c44cfb7c3d08d5ee58a341%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636676986817234391&amp;sdata=%2FZjzrOh65Bak8iDMvk8FgftPDjwdJCEKvYzTPOy0o3U%3D&amp;reserved=0
> [3] https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fkubernetes.io%2Fdocs%2Fconcepts%2Fcluster-administration%2Flogging%2F&amp;data=02%7C01%7Ctnorris%40adobe.com%7C7c32fc01b3c44cfb7c3d08d5ee58a341%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636676986817234391&amp;sdata=DaYQhz70aF0TVgGpSqzhOZZQC8XNT0znAMMDbUkHlSo%3D&amp;reserved=0

Mime
View raw message