hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "eric baldeschwieler (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-331) map outputs should be written to a single output file with an index
Date Thu, 29 Jun 2006 00:29:31 GMT
    [ http://issues.apache.org/jira/browse/HADOOP-331?page=comments#action_12418346 ] 

eric baldeschwieler commented on HADOOP-331:
--------------------------------------------

I think we should do this in an incremental fashion, so...

Each map task would continue to output its own state only.  Although centralizing this per
node is an interesting idea.  I can see how this might be a real benefit in some cases.  Perhaps
you should file this enhancement idea.

Ditto with the transmition to other nodes.  Interesting, but complicated idea.  Maybe you
should file it.  Think that can be taken up at a later date.  Although feedback on how you
would enhance this change to support such later work welcome.

A spill is dumping the buffered output to disk when we accumulate enough info.  Yes, something
like a 100mB buffer seems right.  (configurable possibly)

I think the goal should be that each reduce only reads a single range.  That will keep the
client code simple an will keep us from thrashing as we scale.  This may require some thought,
since if you have a small number of reduce tasks, reading from multipule ranges may prove
more seek efficient than doing the merge.

If we do block compression for intermediates, you would need to align those to reduce targets,
but I don't think we should try to do that in the first version of this.  Especially given
that this data will not be sorted the return to block compression may not be that great. 
(yes, I can construct counter examples, but let's deal with this as a seperate project). 
Checksums also need to be target aligned.

> map outputs should be written to a single output file with an index
> -------------------------------------------------------------------
>
>          Key: HADOOP-331
>          URL: http://issues.apache.org/jira/browse/HADOOP-331
>      Project: Hadoop
>         Type: Improvement

>   Components: mapred
>     Versions: 0.3.2
>     Reporter: eric baldeschwieler
>     Assignee: Yoram Arnon
>      Fix For: 0.5.0

>
> The current strategy of writing a file per target map is consuming a lot of unused buffer
space (causing out of memory crashes) and puts a lot of burden on the FS (many opens, inodes
used, etc).  
> I propose that we write a single file containing all output and also write an index file
IDing which byte range in the file goes to each reduce.  This will remove the issue of buffer
waste, address scaling issues with number of open files and generally set us up better for
scaling.  It will also have advantages with very small inputs, since the buffer cache will
reduce the number of seeks needed and the data serving node can open a single file and just
keep it open rather than needing to do directory and open ops on every request.
> The only issue I see is that in cases where the task output is substantiallyu larger
than its input, we may need to spill multiple times.  In this case, we can do a merge after
all spills are complete (or during the final spill).

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


Mime
View raw message