Return-Path: Delivered-To: apmail-hadoop-mapreduce-issues-archive@minotaur.apache.org Received: (qmail 7698 invoked from network); 15 Jul 2009 04:41:27 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 15 Jul 2009 04:41:27 -0000 Received: (qmail 61255 invoked by uid 500); 15 Jul 2009 04:41:37 -0000 Delivered-To: apmail-hadoop-mapreduce-issues-archive@hadoop.apache.org Received: (qmail 61211 invoked by uid 500); 15 Jul 2009 04:41:37 -0000 Mailing-List: contact mapreduce-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mapreduce-issues@hadoop.apache.org Delivered-To: mailing list mapreduce-issues@hadoop.apache.org Received: (qmail 61201 invoked by uid 99); 15 Jul 2009 04:41:37 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 15 Jul 2009 04:41:37 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 15 Jul 2009 04:41:35 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 3F945234C004 for ; Tue, 14 Jul 2009 21:41:15 -0700 (PDT) Message-ID: <1262126687.1247632875246.JavaMail.jira@brutus> Date: Tue, 14 Jul 2009 21:41:15 -0700 (PDT) From: "Chris Douglas (JIRA)" To: mapreduce-issues@hadoop.apache.org Subject: [jira] Updated: (MAPREDUCE-712) TextWritter example is CPU bound!! In-Reply-To: <911289839.1246912754875.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/MAPREDUCE-712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated MAPREDUCE-712: ------------------------------------ Attachment: MR712-0.patch RandomTextWriter is probably spending most of its CPU doing its work inefficiently, mostly in generateSentence and Text::encode. For each word, generateSentence generates a random number, writes a String into a StringBuffer, which gets written out as full String, then encoded as Text, then it's finally written out after looking up the counters in the Context for that particular record. This process generates a *lot* of garbage, so Owen and Arun's hypothesis that we're spending an inordinate amount of time in GC seems well founded. The attached should be more sparing of the CPU. Would you mind confirming? > TextWritter example is CPU bound!! > ---------------------------------- > > Key: MAPREDUCE-712 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-712 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: task > Affects Versions: 0.20.1, 0.21.0 > Environment: ~200 nodes cluster > Each node has the following configuration: > Processors: 2 x Xeon L5420 2.50GHz (8 cores) - Harpertown C0, 64-bit, quad-core (8 CPUs) > 4 Disks > 16 GB RAM > Linux 2.6 > Hadoop version: trunk > Reporter: Khaled Elmeleegy > Attachments: MR712-0.patch > > > Running the RandomTextWritter example job ( from the examples jar) pegs the machiens' CPUs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.