hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <cutt...@apache.org>
Subject Re: HTTP transport?
Date Mon, 28 Sep 2009 17:01:17 GMT
Scott Carey wrote:
> HTTP is very useful and typically performs very well.  It has lots of
> things built-in too.  In addition to what you mention, it  has a
> caching mechanism built-in, range queries, and all sorts of ways to
> tag along state if needed.  To top it off there are a lot of testing
> and debugging tools available for it.  So from that front using it is
> very attractive.

Glad you agree!

> However, In my experience zero-copy is not going to be much of a gain
> performance-wise for this sort of application, and will limit what
> can be done.  As long as a servlet doesn't transcode data and mostly
> copies, it will be very fast - many multiples of gigabit ethernet
> speed per CPU - far more than most disk setups will handle for a
> while.

In MapReduce, datanodes are also running map and reduce tasks, so we'd 
like it if datanodes not only keep up with disks and networks, but also 
use minimal CPU to do so.  Zerocopy on the datanode has been shown to 
help significantly MapReduce benchmarks.  That said, zero copy may or 
may not be significantly better than one-copy.  I intend to benchmark 
that.  But the important thing to measure is not just throughput but 
also idle CPU.

> Additionally, I'm not sure CRC checking should occur on the
> client.  TCP/IP already checksums packets, so network data corruption
> over HTTP is not a primary concern.   The big concern is silent data
> corruption on the disk.

I believe that disks are the largest source of data corruption, but I am 
not confident they are the only source.  HDFS uses end-to-end checksums. 
  As data is written to HDFS it is immediately checksummed on the 
client.  This checksum then lives with the data and is validated on the 
client immediately before the data is returned to the application.  The 
goal is to catch corruption wherever it may occur, on disks, on the 
network, or while buffered in memory.  In addition, the checksum is 
validated after data is transmitted to datanodes but before before 
blocks are stored, so that initial network and memory corruptions are 
caught early and the writing process fails, rather than permitting an 
application to write corrupt data.  Finally, datanodes periodically scan 
for corrupt blocks on disks, replacing them with non-corrupt replicas, 
decreasing the chance that over time all replicas become corrupt.

> Additionally, embedding Tomcat tends to be more tricky than Jetty,
> though that can be overcome.  One might argue that we don't even want
> a servlet container, we just want an HTTP connector.  The Servlet API
> is familiar, but for a high performance transport it might just be
> overhead and restrictive.  Direct access to Tomcat's NIO connector
> might be significantly lighter-weight and more flexible. Tomcat's NIO
> connector implementation works great and I have had great success
> with up to 10K connections with the pure Java connector using
> ordinary byte buffers and about 20 servlet threads.

I hope to start benchmarking bulk data RPC over the next few weeks. 
I'll probably start with a servlet using Jetty, then see if I can 
increase throughput and decrease CPU utilization through the use of 
things like Tomcat's NIO connector, Grizzly, etc.

Doug

Mime
View raw message