Mailing-List: contact hadoop-dev-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hadoop-dev@lucene.apache.org
Message-ID: <15147971.1147095141660.JavaMail.jira@brutus>
Date: Mon, 8 May 2006 13:32:21 +0000 (GMT+00:00)
From: "Dominik Friedrich (JIRA)" <jira@apache.org>
To: hadoop-dev@lucene.apache.org
Subject: [jira] Commented: (HADOOP-195) transfer map output transfer with
 http instead of rpc
In-Reply-To: <6413697.1146765138386.JavaMail.jira@brutus>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit

    [ http://issues.apache.org/jira/browse/HADOOP-195?page=comments#action_12378420 ] 

Dominik Friedrich commented on HADOOP-195:
------------------------------------------

paul,

i agree, the less you copy the data the better (faster). But moving data in RAM is a few orders of magnitude faster then disk writes (factor 1000-10000 or so). So when optimizing one should start to reduce the slowest operations as much as possible and that are disk IO and network IO. 

I had a short look at the sorter implementation in SequenceFile.java. If I understood the code correctly the sorter reads as big chunks as possible from the input file, sort them in-memory and writes the sorted chunk to a temp file. When the whole input file has been sorted to several sorted chunk files one or more merge runs are started that merge those chuck files to one sorted file which has the complete input file sorted in total order. Before you got the final sorted file the mapper writes a temp file, the sorter reads it and writes several sorted temp files and finaly you run one or more merge phases that have to read those sorted chunks and write the final sorted output.

I'd split the whole sorting. The mapper output collector does not have to write one big output file, it could also write several pre sorted chunks like the sort phase in the current sorter does. The merging phase can be done on the fly when transfering the sorted map output to the reducers. This way you only have to write a few sorted chunks (the number depends on the buffer size of the output collector and the temp data size produced by the mapper). This reduces disk IO and also disk space needed for temp data.

Another possible optimization could be the creation of index files when you've small keys and large values. The collector would dump the values in one big file and put the keys along with the index of the value in sorted index chunks like before. When merging the partial sorted output on the fly while transfering keys and values are joined together again.

I'm not sure if I've the time to implement this and run performance comparisons against the current implementation, but I think this could speed up the sorting quite a lot.

> transfer map output transfer with http instead of rpc
> -----------------------------------------------------
>
>          Key: HADOOP-195
>          URL: http://issues.apache.org/jira/browse/HADOOP-195
>      Project: Hadoop
>         Type: Improvement

>   Components: mapred
>     Versions: 0.2
>     Reporter: Owen O'Malley
>     Assignee: Owen O'Malley
>      Fix For: 0.3
>  Attachments: netstat.log
>
> The data transfer of the map output should be transfered via http instead rpc, because rpc is very slow for this application and the timeout behavior is suboptimal. (server sends data and client ignores it because it took more than 10 seconds to be received.)

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira