[ http://issues.apache.org/jira/browse/HADOOP-331?page=all ] Devaraj Das reassigned HADOOP-331: ---------------------------------- Assignee: Devaraj Das (was: Yoram Arnon) Looking at generating a single map output file per map for now. The plan is to do the following: Define a class called PartKey that will contain two fields - partition number (int) and the actual key (WritableComparable). As we are mapping, the PartKeys and the associated values are written to a buffer. The buffer has a fixed size (configurable via map.output.buffer.size) of 128M and this buffer, when full, is sorted and spilled to disk. We may end up having a couple of these spilled buffers. We do a merge at the end. The sorting takes into account the partition number. Also, the merge emits information about offsets where a particular partition resides in the merged file. The copying phase of reduce strips the partion information contained in the PartKey and feeds the actual map-generated key to the reducer. Makes sense? > map outputs should be written to a single output file with an index > ------------------------------------------------------------------- > > Key: HADOOP-331 > URL: http://issues.apache.org/jira/browse/HADOOP-331 > Project: Hadoop > Issue Type: Improvement > Components: mapred > Affects Versions: 0.3.2 > Reporter: eric baldeschwieler > Assigned To: Devaraj Das > > The current strategy of writing a file per target map is consuming a lot of unused buffer space (causing out of memory crashes) and puts a lot of burden on the FS (many opens, inodes used, etc). > I propose that we write a single file containing all output and also write an index file IDing which byte range in the file goes to each reduce. This will remove the issue of buffer waste, address scaling issues with number of open files and generally set us up better for scaling. It will also have advantages with very small inputs, since the buffer cache will reduce the number of seeks needed and the data serving node can open a single file and just keep it open rather than needing to do directory and open ops on every request. > The only issue I see is that in cases where the task output is substantiallyu larger than its input, we may need to spill multiple times. In this case, we can do a merge after all spills are complete (or during the final spill). -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira