hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Konstantin Shvachko (Updated) (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (MAPREDUCE-1248) Redundant memory copying in StreamKeyValUtil
Date Wed, 23 Nov 2011 06:12:40 GMT

     [ https://issues.apache.org/jira/browse/MAPREDUCE-1248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Konstantin Shvachko updated MAPREDUCE-1248:

    Affects Version/s: 0.22.0
             Assignee: Ruibang He
> Redundant memory copying in StreamKeyValUtil
> --------------------------------------------
>                 Key: MAPREDUCE-1248
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1248
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: contrib/streaming
>    Affects Versions: 0.22.0
>            Reporter: Ruibang He
>            Assignee: Ruibang He
>            Priority: Minor
>             Fix For: 0.22.0
>         Attachments: MAPREDUCE-1248-v1.0.patch
> I found that when MROutputThread collecting the output of  Reducer, it calls StreamKeyValUtil.splitKeyVal()
and two local byte-arrays are allocated there for each line of output. Later these two byte-arrays
are passed to variable key and val. There are twice memory copying here, one is the System.arraycopy()
method, the other is inside key.set() / val.set().
> This causes double times of memory copying for the whole output (may lead to higher CPU
consumption), and frequent temporay object allocation.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message