hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kan Zhang <...@yahoo-inc.com>
Subject Re: HTTP transport?
Date Wed, 14 Oct 2009 01:59:08 GMT

On 10/9/09 12:56 PM, "Doug Cutting" <cutting@apache.org> wrote:

> Sanjay Radia wrote:
>> Will the RPC over HTTP be transparent so that that we can replace with a
>> different layer if needed?
> Yes.
>> My worry was the separation of data and checksums; someone had mentioned
>> that one could do this over 2 RPCs - that is not transparent.
> That was suggested as a possibility if we did not want to use RPC for
> data, but rather raw HTTP, e.g., with a separate URL per block.  The
> zerocopy support built into most HTTP servers only supports entire
> responses from a single file, so if we wanted to take advantage of these
> zerocopy implementations we'd not use RPC for block access, but could
> use HTTP and hence share security, etc.  Using raw HTTP for block access
> might also perform better, since it can use TCP flow control, rather
> than RPC call/response.  In my microbenchmarks, RPC call/response was
> fast enough to easily saturate disks and networks, so that might be
> moot, although RPC call/response for file data may use more CPU than
> we'd like.  With our own transport implementation we could get RPC
> call/response to use zerocopy for file data.

One problem I see with using HTTP is that it's expensive to provide data
encryption. We're currently adding 2 authentication mechanisms (Kerberos and
DIGEST-MD5) to our existing RPC. Both of them can provide data encryption
for subsequent communication over the authenticated channel. However, when
similar authentication mechanisms are specified for HTTP (SPNEGO and HTTP
DIGEST, respectively), they don't provide data encryption (correct me if I'm
wrong). For data encryption over HTTP, one has to use SSL, which is


View raw message