hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <cutt...@apache.org>
Subject Re: HTTP transport?
Date Fri, 09 Oct 2009 17:49:55 GMT
Owen O'Malley wrote:
> SPNEGO is the 
> standard method of using Kerberos with HTTP and we are planning to use 
> that for the web UI's.

Java 6 also supports using SPNEGO for RPC over HTTP out of the box:

http://java.sun.com/javase/6/docs/technotes/guides/net/http-auth.html

> I also have serious doubts about performance, but that is hard to answer 
> until we have code to test.

The good news is that, since the HTTP stuff is already implemented, we 
can test its performance easily.  Performance of insecure access over 
HTTP looks good so far.  It's an open question are how much HTTP-based 
security will slow things versus non-HTTP-based security.

> It is an interesting question how much we 
> depend on being able to answer queries out of order. There are some 
> parts of the code where overlapping requests from the same client 
> matter. In particular, the terasort scheduler uses threads to access the 
> namenode. That would stop providing any pipelining, which I believe 
> would be significant.

No, we wouldn't stop any pipelining, we'd just use more connections to 
implement it.  With HttpClient one can limit the number of pooled 
connnections per host:

http://hc.apache.org/httpclient-3.x/apidocs/org/apache/commons/httpclient/MultiThreadedHttpConnectionManager.html#setMaxConnectionsPerHost%28int%29

Connections are not free of course, but Jetty has been benchmarked at 
20,000 concurrent connections:

http://cometdaily.com/2008/01/07/20000-reasons-that-comet-scales/

> In short, I think that an HTTP transport is great for playing with, but 
> I don't think you can assume it will work as the primary transport.

I agree, we cannot assume it.  But it's easy to try it and see how it 
fares.  Any investment in getting it working is perhaps not wasted, 
since, besides providing a performance baseline, it also may be useful 
to provide HTTP-based access to services even if a higher-performance 
option is implemented.

Doug

Mime
View raw message