hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Douglas (JIRA)" <j...@apache.org>
Subject [jira] Resolved: (HADOOP-1609) Optimize MapTask.MapOutputBuffer.spill() by not deserialize/serialize keys/values but use appendRaw
Date Fri, 20 Jun 2008 01:11:45 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-1609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Chris Douglas resolved HADOOP-1609.
-----------------------------------

       Resolution: Duplicate
    Fix Version/s: 0.17.0

> Optimize MapTask.MapOutputBuffer.spill() by not deserialize/serialize keys/values but
use appendRaw
> ---------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1609
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1609
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.14.0
>            Reporter: Espen Amble Kolstad
>             Fix For: 0.17.0
>
>         Attachments: spill.patch, spill.patch
>
>
> In MapTask.MapOutputBuffer.spill() every key and value is read from buffer and then written
to file with append(key, value):
> {code}
>       DataInputBuffer keyIn = new DataInputBuffer();
>       DataInputBuffer valIn = new DataInputBuffer();
>       DataOutputBuffer valOut = new DataOutputBuffer();
>       while (resultIter.next()) {
>         keyIn.reset(resultIter.getKey().getData(), 
>                     resultIter.getKey().getLength());
>         key.readFields(keyIn);
>         valOut.reset();
>         (resultIter.getValue()).writeUncompressedBytes(valOut);
>         valIn.reset(valOut.getData(), valOut.getLength());
>         value.readFields(valIn);
>         writer.append(key, value);
>         reporter.progress();
>       }
> {code}
> When you have complex objects, like nutch's ParseData or Inlinks, this takes time and
creates lots of garbage.
> I've created a patch, it seems to be working, only tested on 0.13.0.
> It's a bit clumsy, since ValueBytes is cast to Un-/CompressedBytes in SequenceFile.Writer.
> Thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message