openwhisk-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michele Sciabarra <mich...@sciabarra.com>
Subject Re: [LONG] Discussing my implementation of Go actions
Date Sat, 10 Mar 2018 08:54:30 GMT
# the problem of the protocol

I am aligned on your view. Indeed what I actually did was to reverse engineering some of the
runtimes (most notably) the dockerskeleton to implement the Go based docker skeleton. For
efficiency, I had to depart from some of the current practices. 

As you remarked, the basic problem is while java, python, node can "load code" dynamically
and the protocol can be hidden in the proxy, for binaries you cannot. SO I have to mandate
a protocol for the binaries. 

We can hide this protocol in a library but since there are infinite ways of generate a binary
(not just go and swift, but also rust, haskell, even C#. And actually executing a binary in
practice means also running any interpreted language, so you could support efficiently Ruby
or bash or whatever is not already  available on the supported programming language. 

As someone noted, OpenFaas claims to be able to do that. I should investigate better on what
OpenFaas does, maybe to copy some ideas :). I know, we should focus on Go but it is not the
nature of the problem. The nature of the problem is supporting generic executables

So what really need to do, is to create AND document a protocol for generic unix executable
to be able to interface with OpenWhisk. And the protocol should be simple enough that can
be implemented without libraries!

----
# current protocol situation

At this stage, we have implicitly a protocol for native actions. Let's call it "OpenWhisk
Native Protocol", currently at "version 0.9"

 This protocol is actually in use at least for swift actions, where I guess there is a significant
user base already, and it is somewhat documented in the dockerskeleton. So what we are really
discussing here is, in my interpretation:

How can we evolve the protocol to make easier the transition for the existing code base?

So let's try to put it in a formal way, what we have discussing here.

For "OpenWhisk Native Protocol, v0" the current one, we have:

actions will receive the input in stdin AND on the command line, will produce as much output
as they like in stdin and stderr as long as the last one in stdout is a valid json object.
It is  no-brainer it is not a very efficient implementation.

Also binaries are not required to identify themselves (something that reminds me of HTTP/0.9)
and the protocol they speak.

---
# the dicussion for protocol v1

For the "OpenWhisk Native Protocol v1", the one I am trying to implement, I am proposing this
solution:

- the native binary must identify himself (for error detection) with {openwhisk: 1} (with
a view to became 2, 3, 4 for the streaming support)
- it will loop on stdin, produce output on stdout, one json per line, and log on stderr

HOWEVER, as it has been noted, this will create problems with existing Swift binaries. Swift
users log with Swift print that produces output on stdout. 

So I was recommended to use a different channel (channel 3?) and skip the handshake.

My concerns are the worse error detection (for the good or the bad, I still think that detecting
a misbehaving binary not supporting the protocol at init time is a good thing), and the fact
that coding will became a bit awkward for both Go and Swift users. And I believe the handshake
should be used anyway.

At this stage, my idea is just to add to my go proxy a couple of environment variables, like:

OPENWHISK_OUTPUT_CHANNEL=3
OPENWHISK_REQUIRE_HANDSHAKE=no

so the proxy can be used with no changes for swift actions, while leaving a natural behaviour
for Go actions. A new docker image is required anyway to support. However I think this idea
should be discussed. 

# proposal to document the native protocol

However, I am proposing here just create a page, openwhisk-native-protocol.md and discuss
the protocol before, and write down this behaviour as the "openwhisk native protocol" v1,
and be prepared for more evolutions of the protocol to support streaming and other planned
features.


On Fri, Mar 9, 2018, at 9:53 PM, Rodric Rabbah wrote:
> This is a good discussion - thanks for bringing it to the dev list.
> 
> In essence, native actions push the boundary of how much of the function
> abstraction we can maintain. For some of the managed runtimes which include
> Node.js, Python and Java, we are able to hide the protocol you allude to in
> what we loosely have called the runtime proxy. The proxy is where the
> initialization and run protocols are relevant. We deliberately resisted
> publishing and documenting the proxy protocol for some time.
> 
> As you observed, the initialization must handshake to the invoker - we do
> this today with a generic HTTP response code of 2xx. Anything else is
> treated as error. If your current proposal, this is equivalent to
> {"openwhisk": 1}.
> 
> The function can only reach the run stage if initialization returns
> successfully. This does not however provide any strong guarantee that the
> action is well formed or valid. Only that for Node.js for example, that it
> parses OK. All errors are eventually detected at the run phase.
> 
> In the work you've done, because of the nature of the native functions, one
> way to look at the changes is that you're exposing the implicit proxy
> protocol to the "function" itself. It must confirm that it's ready (for
> whatever notion of readiness it deems valid), and only then is the run
> executed.
> 
> In this way, one can say that a function execution in this model follows a
> different model, where there are at least two (although can admit three)
> stages: the function must implement an initializer and a run method, and if
> you admit the third, a shutdown method (we don't have this today in general
> but it would be nice to allow a "function" to shutdown cleanly before its
> resources are killed).
> 
> These changes are a departure from the simple interface that some of the
> languages have, but already for native functions, they _have_ to do
> something different. Namely, they receive the input from stdin, and they
> produce the result on stdout (as the last line); this is not a nice and
> clean function interface compared to say Node.js, Python, or Java.
> 
> So I like to think of this and suggest it as a way of grounding the
> discussion in terms of a foreign function interfaces for OpenWhisk. A
> native function already breaks some of the clean abstractions. Before I
> elaborate further, I will pause here for feedback.
> 
> -r
> 
> On Fri, Mar 9, 2018 at 6:13 AM, Michele Sciabarra <openwhisk@sciabarra.com>
> wrote:
> 
> > I just did a  PR of my version of the Golang action implementation. It
> > does some "breaking" changes  and there is some discussion on the slack
> > channel.
> >
> > So I report the current situation n here, looking for advices and change
> > recommendations.  Since I am a bit confused, if I remember well, one Apache
> > rule is  the mailing list is the ultimate source for the truth...
> >
> > It currently works this way (I call it the "pipe-loop" protocol)
> >
> > A golang action (or a generic binary) is expected to follow this
> > "protocol":
> >
> > * starts with  {"openwhisk": 1}
> > * reads on line in stardard input, expecting a json ON A SINGLE LINE
> > * process the line, emits logs in stderr (can be multiple lines)
> > * outputs a line in stdout in json format ON A SINGLE LINE
> > * repeat forever
> >
> > It is important to note this design is easy to implement and works even
> > for bash scripts, but it is easy to use also perl, ruby, haskell in an
> > EFFICIENT way.  Indeed this bash script (with jq) is part of my tests:
> >
> > ---
> > #!/bin/bash
> > echo '{"openwhisk":1}'
> > while read line
> > do
> >    name="$(echo $line | jq -r .name)"
> >    logger -s "name=$name"
> >    hello="Hello, $name"
> >    logger -s "sent response"
> >    echo '{"hello":"'$hello'"}'
> > done
> > ---
> >
> > Things discussed:
> >
> > 1) ​remove the header {"openwhisk":1}
> >
> > Actually initially it was not there. But I decided to add this
> > requirements because the action need to speak a protocol ANYWAY.
> >
> > Most important, I explain why I require it starts with "{"openwhisk: 1}".
> >
> > The main reason is: I start the child process at init time, and I wanted
> > to detect when it does not behave properly.
> >
> > The simplest problem happens when the action crashes immediately. For
> > example, a common reason for this problem is uploading a binary using some
> > dynamic libraries not available in the runtime. For  example a swift
> > action. By defaults it load a lot of different libraries, it crashes
> > immediately but I cannot detect it until I try to read its stdin.
> >
> > I can remove this requirement if someone can show me the go code to check
> > that cmd.Start("true") or cmd.Start("pwd") exited 😃
> >
> > If it is not doable, and I skip  the handshake, even if the command
> > crashed, I will not detect the problem until a /run is executed and the
> > action times out...
> >
> > Carlos say it is fine. It is ok for me but I still think an early problem
> > detection would be better. Also James recommended me to provide as much as
> > error detection to the user as early as possible. Kinda of conflicting
> > directives here...
> >
> > Suggestions?
> >
> > 2) more checks at init time
> >
> > I added some sanity checks.  Probably too many. I tried to detect the
> > error at deployment time, not at invocation time.
> >
> > This is different from what currently for example dockerskeleton does.
> >
> > If I upload for example something wrong, like a non-zip, a non-elf
> > executable, my init returns {"error": "description"}, while currently the
> > dockerskeleton returns always OK.
> >
> > Recommendations here?
> >
> > 3) output to another channel the result
> >
> > Currently I require logs goes to stderr, and stdout is for interacting
> > with the parent process.
> >
> > Rodric suggested to output to a separate channel (channel 3?)  and use
> > stdout and stderr for logs.
> >
> > While doable, I need to provision another pipe, and the implementation
> > should probably do some syscalls to retrieve file descriptor 3. It would
> > complicate implementation, while currently it is straightforward for any
> > language that does not have a library available. For swift, even to flush
> > stdout I needed to write "linux specific" code... I do not dare to think
> > what I need to do to write in fd3...
> >
> > My opinion is that using stdout for I/O and stderr for logs is a better
> > choice than opening another file descriptor.
> >
> > Thoughts here?
> >
> >
> >
> >
> >
> >
> >
> > --
> >   Michele Sciabarra
> >   openwhisk@sciabarra.com
> >

Mime
View raw message