hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dominik Friedrich (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-195) transfer map output transfer with http instead of rpc
Date Sun, 07 May 2006 20:10:21 GMT
    [ http://issues.apache.org/jira/browse/HADOOP-195?page=comments#action_12378313 ] 

Dominik Friedrich commented on HADOOP-195:

Has anybody tried to use the APR (Apache portable runtime) with a JNI wrapper like tomcat?
With this wrapper you could use OS features like sendfile, epoll, random number generator
and so on. I haven't used it myself, just saw some performance test with JBoss web which is
using this.

This is might bit off topic, but Java NIO has been mentioned before. I've played around with
Java NIO some weeks ago to see where it could be usefull in Nutch/hadoop. With my simple tests
I found no significant performance improvements in file IO. I guess the tests were just too
simple (serializing/deserializing Java objects to/from disk) to give useful results.

I also tested the network throughput with a multiplexed socket compared to the one-thread-per-client
design. With NIO the throughput was almost independent from the number of concurrent connections
while the threading overhead became very significant with 100+ threads. 

My testbed was a simple server with two IO thread and a few worker thread and bunch of clients
that sent messages (serialized Java objects) to the server. On the server side one IO thread
read messages from the socket and put them into a blocking queue and the other IO thread read
outgoing messages from another blocking queue and sent them. The worker thread pulled messages
from the in-queue, work on them (in my test they just copied the message) and put their result
on the out-queue. This way the server could handle a few 1000 connections without problem.
This design or something similar might be useful for the namenode or distributed search as
mentioned before.

> transfer map output transfer with http instead of rpc
> -----------------------------------------------------
>          Key: HADOOP-195
>          URL: http://issues.apache.org/jira/browse/HADOOP-195
>      Project: Hadoop
>         Type: Improvement

>   Components: mapred
>     Versions: 0.2
>     Reporter: Owen O'Malley
>     Assignee: Owen O'Malley
>      Fix For: 0.3

> The data transfer of the map output should be transfered via http instead rpc, because
rpc is very slow for this application and the timeout behavior is suboptimal. (server sends
data and client ignores it because it took more than 10 seconds to be received.)

This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators:
For more information on JIRA, see:

View raw message