hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ryan Rawson <ryano...@gmail.com>
Subject Re: HTTP transport?
Date Wed, 30 Sep 2009 03:20:43 GMT
I wanted to chime in on a few things, since avro is a candidate for
the HBase RPC.

I am not sure that "browser compatibility" is a legitimate requirement
for this kind of thing. It is at odds with high performance in a
number of areas, and isn't the driving factor for using HTTP anyways.

Security - you can get the advantage of security standards, such as
the X.509 SSL cert, without actually using HTTPS.

Headers - I don't really think providing a caching mechanism built
into the RPC layer is a top requirement.  We'd then have to build in a
GET/POST idempotent flag into the Avro IDL, and everyone would have to
get it right, etc.

Considering my top requirement is "make bulk data access and RPC
rate/sec as high as possible", I'm not sure caching fits in here and
can work against that.

On Tue, Sep 29, 2009 at 8:06 PM, Scott Carey <scott@richrelevance.com> wrote:
> On 9/29/09 2:57 PM, "stack" <stack@duboce.net> wrote:
>> On Tue, Sep 29, 2009 at 2:08 PM, Doug Cutting <cutting@apache.org> wrote:
>>> Alternately, we could try to make Avro's RPC more HTTP-friendly, and pull
>>> stuff out of Avro's payload into HTTP headers.  The downside of that would
>>> be that, if we still wish to support non-HTTP transports, we'd end up with
>>> duplicated logic.
>> There would be loads of upside I'd imagine if there was a natural mapping of
>> avro payload specifiers and metadata up into http headers in terms of
>> visibility
> There are some very serious disadvantages to headers if overused.
> I highly advise keeping what goes into the URL and headers very specific to
> support well defined features for this specific transport type.  Otherwise,
> put it in the data payload for all transports.
> A couple header disadvantages:
> * Limited character set allowed.  You can't put any data in there you want,
> and you can end up with an inefficient encoding mess that is not easy to
> read.
> * Headers don't take advantage of other transport features.  For example,
> Content-Encoding:gzip provides gzip compression support for the data
> payload, but you can't compress the headers in HTTP.
> On the other hand, Custom headers can be handy ways to implement transport
> specific handshakes or advertize capabilities (which helps build in
> cross-version compatibility).
> But browsers only work with the standard ones, so whatever 'browser
> requirement' is out there is going to be a limited subset no matter how you
> do it.
> This thread brings up the security features.  Payload encryption does not
> seem to be a transport feature -- but it could be done via something like
> Content-Encoding (X-Avro-Content-Encrypted?).  It seems to fit better IMO
> within the payload itself, or at the socket / network level via SSH or a
> secure tunnel.
> Authentication is a better fit for the transport layer -- but as mentioned
> elsewhere if it has to be done for all transports, could it fit in the
> payload somehow?
>> So, are we're talking about doing something like following for a
>> request/response:
>>  GET /avro/org.apache.hadoop.hbase.RegionServer HTTP/1.1
>>  Host: www.example.com
>>  HTTP/1.1 200 OK
>>  Date: Mon, 23 May 2005 22:38:34 GMT
>>  Server: Apache/ (Unix)  (Red-Hat/Linux)
>>  Last-Modified: Wed, 08 Jan 2003 23:11:55 GMT
>>  Etag: "3f80f-1b6-3e1cb03b"
>>  Accept-Ranges: bytes
>>  Content-Length: 438
>>  Connection: close
>>  Content-Type: X-avro/binary
> Its acceptable to drop a lot of the headers above.  Some of them are useful
> to implement extended functionality -- the Etag can be used for caching if
> that were desired, for example.  Keep-Alive connections and chunked
> responses are nice built-ins too.
>> ... or some variation on above on each and every RPC?
>> St.Ack

View raw message