hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <cutt...@apache.org>
Subject HTTP transport?
Date Fri, 11 Sep 2009 21:41:23 GMT
I'm considering an HTTP-based transport for Avro as the preferred, 
high-performance option.

HTTP has lots of advantages.  In particular, it already has
  - lots of authentication, authorization and encryption support;
  - highly optimized servers;
  - monitoring, logging, etc.

Tomcat and other servlet containers support async NIO, where a thread is 
not required per connection.  A servlet can process bulk data with a 
single copy to and from the socket (bypassing stream buffers).  Calls 
can be multiplexed over a single HTTP connection using Comet events.

http://tomcat.apache.org/tomcat-6.0-doc/aio.html

Zero copy is not an option for servlets that generate arbitrary data, 
but one can specify a file/start/length tuple and Tomcat will use 
sendfile to write the response.  That means that while HDFS datanode 
file reads could not be done via RPC, they could be done via HTTP with 
zero-copy.  If authentication and authorization are already done in the 
HTTP server, this may not be a big loss.  The HDFS client might make two 
HTTP requests, one to read a files data, and another to read its 
checksums.  The server would then stream the entire block to the client 
using sendfile, using TCP flow control as today.

Thoughts?

Doug

Mime
View raw message