avro-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Philip Zeyliger (JIRA)" <j...@apache.org>
Subject [jira] Commented: (AVRO-341) specify avro transport in spec
Date Sat, 06 Feb 2010 04:11:27 GMT

    [ https://issues.apache.org/jira/browse/AVRO-341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12830451#action_12830451

Philip Zeyliger commented on AVRO-341:

I'm hijacking this thread for the description (as opposed to the title).  Let's start thinking
about a high-performance, secure transport for Avro.

Here's a dump of my current thoughts on this topic, after reading up a bit on SASL, and reading
through some of the the Hadoop security patches.

First off, we should probably call this a "protocol".  It's a bit tricky, since we've already
got a notion of Avro protocols, but "transport" reminds people of http://en.wikipedia.org/wiki/Transport_Layer,
i.e., UDP vs TCP, and that's not what we're discussing here.  (On the TCP vs UDP front, let's
focus our efforts first on a TCP protocol.  There might be a lot of value of having a UDP
protocol as well, but it's clear that we'll need a TCP one.)

It's a bit meta, but I'd like us to consider describing Avro's protocol in terms of (and here
the terminology falls down) an Avro protocol, or at least in terms of Avro records.  Instead
of saying "and then there shall be a long, encoded like so, and then it shall by follows by
that many bytes", we should just say, and "then shall we receive a record with the following
schema".  We already do so in part, and I think that's the right direction.  I think it will
make the description of the protocol clearer, and, I think, it will let the implementation
worry re-use some schema functionality.  (I think implementations should use the most type-safe
APIs they have available to them, but, hey, that's by definition an implementation detail.)

In terms of the "primitives", here's what I can think of:
 * CALL; this is the work-horse of the RPC, analagous to http://hadoop.apache.org/avro/docs/1.2.0/spec.html#Call+Format.
 If we decide to do schema resolution at the handshake level, we would do it here.  Returns
the response.  May throw AuthenticationRequired.
* AUTHENTICATE: this is the command for authentication.  SASL sometimes requires a back and
forth (until it's "done"); we'd put the hooks for all of that here.
* DISCOVER: Asks the server for information about itself.  Specifically, servers may tell
clients what protocols they support.  This may throw AuthenticationRequired or return nothing,
if the server wants to be cagey.  This is in some sense similar to FB303: https://svn.apache.org/repos/asf/incubator/thrift/trunk/contrib/fb303/if/fb303.thrift
.  In a friendly environment, a server could tell you who's running it (a username), what
machine it's on, arbitrary key/value statistics.

We absolutely need to support piggy-backing of commands.  One way to do that is for clients
to simply be able to send multiple commands in a row, without waiting for the response.  Or
having commands able to include subcommands.

We need to support out-of-order responses and "one way" (don't wait for a response) commands.

We still need to do framing.  Also, SASL requires that all bytes after the succesful SASL
authentication are wrapped by SASL, so servers and clients need to have a state machine that
understands that, and wraps appropriately.  (We could maybe have avoided framing if we supported
framing directly in Avro's string primitive type, like we do in Avro's map type, by having
a negative string length indicate a string that is continued.)

Finally, we need to think hard about how to version this protocol itself.  It's appealing
to be able to add commands in the future ("oneway" is an example) or to enrich the response
of commands like "DISCOVER".  It's noteworthy that text-based protocols like IMAP have had
little trouble extending themselves to stuff like SASL, because they could just augment what
existing commands did.  (RFC 4959  is pretty short.)   A simple approach would be to bootstrap
it by sending hash(avro protocol schema), and doing much like we do with calls right now.

Anyway, that's where I am right now.  Looking forward to more discussion.

-- Philip

> specify avro transport in spec
> ------------------------------
>                 Key: AVRO-341
>                 URL: https://issues.apache.org/jira/browse/AVRO-341
>             Project: Avro
>          Issue Type: Improvement
>          Components: spec
>            Reporter: Doug Cutting
> We should develop a high-performance, secure, transport for Avro.  This should be named
with avro: uris.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message