hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Scott Carey <sc...@richrelevance.com>
Subject Re: HTTP transport?
Date Sat, 26 Sep 2009 00:36:25 GMT
Ok, I have some thoughts on this.  I might be misinterpreting some use cases here however.

HTTP is very useful and typically performs very well.  It has lots of things built-in too.
 In addition to what you mention, it  has a caching mechanism built-in, range queries, and
all sorts of ways to tag along state if needed.  To top it off there are a lot of testing
and debugging tools available for it.  So from that front using it is very attractive.

However, In my experience zero-copy is not going to be much of a gain performance-wise for
this sort of application, and will limit what can be done.  As long as a servlet doesn't transcode
data and mostly copies, it will be very fast - many multiples of gigabit ethernet speed per
CPU - far more than most disk setups will handle for a while.  Furthermore, it is easier to
optimize disk requests to be 'sequentially chunky' if it goes through the JVM.  And I suspect
that for many use cases, optimizing disk I/O is more valuable than a little bit of extra CPU
spent copying data into and out of the process.
Additionally, I'm not sure CRC checking should occur on the client.  TCP/IP already checksums
packets, so network data corruption over HTTP is not a primary concern.   The big concern
is silent data corruption on the disk.  For the DataNode use case, it should find such errors
as early as possible, and not rely on clients discovering errors.  Then it can coordinate
with the NameNode on fixing the block or discarding it.  So if it has to check the file integrity
anyway, there is no reason to worry about zero-copy.  Avoiding the extra request for the CRC
data at least partially counters the loss of zero-copy.

Additionally, embedding Tomcat tends to be more tricky than Jetty, though that can be overcome.
 One might argue that we don't even want a servlet container, we just want an HTTP connector.
 The Servlet API is familiar, but for a high performance transport it might just be overhead
and restrictive.  Direct access to Tomcat's NIO connector might be significantly lighter-weight
and more flexible.
Tomcat's NIO connector implementation works great and I have had great success with up to
10K connections with the pure Java connector using ordinary byte buffers and about 20 servlet
threads.  But if a large number of open connections are not needed (less than about 5x the
number of CPU core threads) then thread-per-connection servlet containers work ok too.  These
sort of implementation details can evolve over time however.

Just my 2c

-Scott



On 9/11/09 2:41 PM, "Doug Cutting" <cutting@apache.org> wrote:

I'm considering an HTTP-based transport for Avro as the preferred,
high-performance option.

HTTP has lots of advantages.  In particular, it already has
  - lots of authentication, authorization and encryption support;
  - highly optimized servers;
  - monitoring, logging, etc.

Tomcat and other servlet containers support async NIO, where a thread is
not required per connection.  A servlet can process bulk data with a
single copy to and from the socket (bypassing stream buffers).  Calls
can be multiplexed over a single HTTP connection using Comet events.

http://tomcat.apache.org/tomcat-6.0-doc/aio.html

Zero copy is not an option for servlets that generate arbitrary data,
but one can specify a file/start/length tuple and Tomcat will use
sendfile to write the response.  That means that while HDFS datanode
file reads could not be done via RPC, they could be done via HTTP with
zero-copy.  If authentication and authorization are already done in the
HTTP server, this may not be a big loss.  The HDFS client might make two
HTTP requests, one to read a files data, and another to read its
checksums.  The server would then stream the entire block to the client
using sendfile, using TCP flow control as today.

Thoughts?

Doug


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message