hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Scott Carey <sc...@richrelevance.com>
Subject Re: HTTP transport?
Date Wed, 30 Sep 2009 03:06:22 GMT

On 9/29/09 2:57 PM, "stack" <stack@duboce.net> wrote:

> On Tue, Sep 29, 2009 at 2:08 PM, Doug Cutting <cutting@apache.org> wrote:
>> Alternately, we could try to make Avro's RPC more HTTP-friendly, and pull
>> stuff out of Avro's payload into HTTP headers.  The downside of that would
>> be that, if we still wish to support non-HTTP transports, we'd end up with
>> duplicated logic.
> There would be loads of upside I'd imagine if there was a natural mapping of
> avro payload specifiers and metadata up into http headers in terms of
> visibility

There are some very serious disadvantages to headers if overused.

I highly advise keeping what goes into the URL and headers very specific to
support well defined features for this specific transport type.  Otherwise,
put it in the data payload for all transports.

A couple header disadvantages:
* Limited character set allowed.  You can't put any data in there you want,
and you can end up with an inefficient encoding mess that is not easy to
* Headers don't take advantage of other transport features.  For example,
Content-Encoding:gzip provides gzip compression support for the data
payload, but you can't compress the headers in HTTP.

On the other hand, Custom headers can be handy ways to implement transport
specific handshakes or advertize capabilities (which helps build in
cross-version compatibility).
But browsers only work with the standard ones, so whatever 'browser
requirement' is out there is going to be a limited subset no matter how you
do it.

This thread brings up the security features.  Payload encryption does not
seem to be a transport feature -- but it could be done via something like
Content-Encoding (X-Avro-Content-Encrypted?).  It seems to fit better IMO
within the payload itself, or at the socket / network level via SSH or a
secure tunnel.

Authentication is a better fit for the transport layer -- but as mentioned
elsewhere if it has to be done for all transports, could it fit in the
payload somehow? 

> So, are we're talking about doing something like following for a
> request/response:
>  GET /avro/org.apache.hadoop.hbase.RegionServer HTTP/1.1
>  Host: www.example.com
>  HTTP/1.1 200 OK
>  Date: Mon, 23 May 2005 22:38:34 GMT
>  Server: Apache/ (Unix)  (Red-Hat/Linux)
>  Last-Modified: Wed, 08 Jan 2003 23:11:55 GMT
>  Etag: "3f80f-1b6-3e1cb03b"
>  Accept-Ranges: bytes
>  Content-Length: 438
>  Connection: close
>  Content-Type: X-avro/binary

Its acceptable to drop a lot of the headers above.  Some of them are useful
to implement extended functionality -- the Etag can be used for caching if
that were desired, for example.  Keep-Alive connections and chunked
responses are nice built-ins too.

> ... or some variation on above on each and every RPC?
> St.Ack

View raw message