hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sameer Paranjpye (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-331) map outputs should be written to a single output file with an index
Date Tue, 17 Oct 2006 19:27:38 GMT
    [ http://issues.apache.org/jira/browse/HADOOP-331?page=comments#action_12443043 ] 
Sameer Paranjpye commented on HADOOP-331:

> But then we cannot use SequenceFile's merger, since keys wouldn't be independently comparable,

Yes, you're right, they wouldn't. This optimization is probably not worth the trouble.

To summarize, this is what we appear to have ended up with.

- A couple of key variants
class KeyByteOffset {
  WritableComparable key;
  int offset;
  int length;

class PartKey {
  int partition;
  WritableComparable key;

- In RAM data structures
List<KeyByteOffset>[NumReduces]  keyLists; // one list of keys per reduce
byte[] valueBuffer; // serialized values are appended to this buffer

- For each <key, value> pair.
Determine partition for the key. Append the key to keyLists[partition], append the value to
If #records is greater than #maxRecords or #valueBytes is greater than #maxValueBytes
We have no more records to process.
SPILL - The spill is a block compressed sequence file of <PartKey, value> pairs, with
a <part, offset> index.   
Compressed blocks in this file must not span partitions. 

- At the end, if #numSpills > 1, merge spills

> map outputs should be written to a single output file with an index
> -------------------------------------------------------------------
>                 Key: HADOOP-331
>                 URL: http://issues.apache.org/jira/browse/HADOOP-331
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.3.2
>            Reporter: eric baldeschwieler
>         Assigned To: Devaraj Das
> The current strategy of writing a file per target map is consuming a lot of unused buffer
space (causing out of memory crashes) and puts a lot of burden on the FS (many opens, inodes
used, etc).  
> I propose that we write a single file containing all output and also write an index file
IDing which byte range in the file goes to each reduce.  This will remove the issue of buffer
waste, address scaling issues with number of open files and generally set us up better for
scaling.  It will also have advantages with very small inputs, since the buffer cache will
reduce the number of seeks needed and the data serving node can open a single file and just
keep it open rather than needing to do directory and open ops on every request.
> The only issue I see is that in cases where the task output is substantiallyu larger
than its input, we may need to spill multiple times.  In this case, we can do a merge after
all spills are complete (or during the final spill).

This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message