hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Douglas (JIRA)" <j...@apache.org>
Subject [jira] Updated: (MAPREDUCE-712) TextWritter example is CPU bound!!
Date Wed, 15 Jul 2009 04:41:15 GMT

     [ https://issues.apache.org/jira/browse/MAPREDUCE-712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Chris Douglas updated MAPREDUCE-712:
------------------------------------

    Attachment: MR712-0.patch

RandomTextWriter is probably spending most of its CPU doing its work inefficiently, mostly
in generateSentence and Text::encode. For each word, generateSentence generates a random number,
writes a String into a StringBuffer, which gets written out as full String, then encoded as
Text, then it's finally written out after looking up the counters in the Context for that
particular record. This process generates a *lot* of garbage, so Owen and Arun's hypothesis
that we're spending an inordinate amount of time in GC seems well founded.

The attached should be more sparing of the CPU. Would you mind confirming?

> TextWritter example is CPU bound!!
> ----------------------------------
>
>                 Key: MAPREDUCE-712
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-712
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: task
>    Affects Versions: 0.20.1, 0.21.0
>         Environment: ~200 nodes cluster
> Each node has the following configuration:
> Processors:     2 x Xeon L5420 2.50GHz (8 cores) - Harpertown C0, 64-bit, quad-core (8
CPUs)
> 4 Disks
> 16 GB RAM
> Linux 2.6
> Hadoop version: trunk
>            Reporter: Khaled Elmeleegy
>         Attachments: MR712-0.patch
>
>
> Running the RandomTextWritter example job ( from the examples jar) pegs the machiens'
CPUs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message