Ok, I have some thoughts on this. I might be misinterpreting some use cases here however. HTTP is very useful and typically performs very well. It has lots of things built-in too. In addition to what you mention, it has a caching mechanism built-in, range queries, and all sorts of ways to tag along state if needed. To top it off there are a lot of testing and debugging tools available for it. So from that front using it is very attractive. However, In my experience zero-copy is not going to be much of a gain performance-wise for this sort of application, and will limit what can be done. As long as a servlet doesn't transcode data and mostly copies, it will be very fast - many multiples of gigabit ethernet speed per CPU - far more than most disk setups will handle for a while. Furthermore, it is easier to optimize disk requests to be 'sequentially chunky' if it goes through the JVM. And I suspect that for many use cases, optimizing disk I/O is more valuable than a little bit of extra CPU spent copying data into and out of the process. Additionally, I'm not sure CRC checking should occur on the client. TCP/IP already checksums packets, so network data corruption over HTTP is not a primary concern. The big concern is silent data corruption on the disk. For the DataNode use case, it should find such errors as early as possible, and not rely on clients discovering errors. Then it can coordinate with the NameNode on fixing the block or discarding it. So if it has to check the file integrity anyway, there is no reason to worry about zero-copy. Avoiding the extra request for the CRC data at least partially counters the loss of zero-copy. Additionally, embedding Tomcat tends to be more tricky than Jetty, though that can be overcome. One might argue that we don't even want a servlet container, we just want an HTTP connector. The Servlet API is familiar, but for a high performance transport it might just be overhead and restrictive. Direct access to Tomcat's NIO connector might be significantly lighter-weight and more flexible. Tomcat's NIO connector implementation works great and I have had great success with up to 10K connections with the pure Java connector using ordinary byte buffers and about 20 servlet threads. But if a large number of open connections are not needed (less than about 5x the number of CPU core threads) then thread-per-connection servlet containers work ok too. These sort of implementation details can evolve over time however. Just my 2c -Scott On 9/11/09 2:41 PM, "Doug Cutting" wrote: I'm considering an HTTP-based transport for Avro as the preferred, high-performance option. HTTP has lots of advantages. In particular, it already has - lots of authentication, authorization and encryption support; - highly optimized servers; - monitoring, logging, etc. Tomcat and other servlet containers support async NIO, where a thread is not required per connection. A servlet can process bulk data with a single copy to and from the socket (bypassing stream buffers). Calls can be multiplexed over a single HTTP connection using Comet events. http://tomcat.apache.org/tomcat-6.0-doc/aio.html Zero copy is not an option for servlets that generate arbitrary data, but one can specify a file/start/length tuple and Tomcat will use sendfile to write the response. That means that while HDFS datanode file reads could not be done via RPC, they could be done via HTTP with zero-copy. If authentication and authorization are already done in the HTTP server, this may not be a big loss. The HDFS client might make two HTTP requests, one to read a files data, and another to read its checksums. The server would then stream the entire block to the client using sendfile, using TCP flow control as today. Thoughts? Doug