Return-Path: Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: (qmail 39307 invoked from network); 29 Mar 2011 18:18:49 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 29 Mar 2011 18:18:49 -0000 Received: (qmail 6882 invoked by uid 500); 29 Mar 2011 18:18:46 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 6831 invoked by uid 500); 29 Mar 2011 18:18:46 -0000 Mailing-List: contact common-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-user@hadoop.apache.org Delivered-To: mailing list common-user@hadoop.apache.org Received: (qmail 6823 invoked by uid 99); 29 Mar 2011 18:18:46 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 29 Mar 2011 18:18:46 +0000 X-ASF-Spam-Status: No, hits=3.1 required=5.0 tests=FREEMAIL_FROM,HK_RANDOM_ENVFROM,HTML_FONT_FACE_BAD,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of billmcn@gmail.com designates 209.85.220.176 as permitted sender) Received: from [209.85.220.176] (HELO mail-vx0-f176.google.com) (209.85.220.176) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 29 Mar 2011 18:18:38 +0000 Received: by vxa37 with SMTP id 37so574006vxa.35 for ; Tue, 29 Mar 2011 11:18:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:date:message-id:subject:from:to:cc :content-type; bh=GGfqqWOrI9vaIJP9ZP7XnKcDyPzhqAw+11k9Bdg49T4=; b=femZOdJS1yIkytab7owXu8C3y/LwZxz5HVoog99yNKZ4sOE3i/KGtcrgFR4eyWYoA7 xo90qeU6WzN39BxWzaPvSajtmEW6+LP6vYNvQ2P6hhBK9RQ3R3QKg4vZus+svl7yN6LY o/ZkmZQ+qas5uoA+nnrrsngkh80tYFNdI6XYw= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:date:message-id:subject:from:to:cc:content-type; b=KOlvP3AxEGccwg5v7i8V8aUV2USEKc3amW2RROyaLJkDLcILQGW4sN5JFjihC1nR+z 4vjhPx/b/z1JQQoJF7TXNZxMHTm8P3IEraGbSxjUuWcrQY5ChxCIVnFDKpB+/SEc8lLV 9PmaHwEcUm6sJjZsWN3jBJaR6kaqxvznF2EXw= MIME-Version: 1.0 Received: by 10.220.67.219 with SMTP id s27mr35642vci.41.1301422697972; Tue, 29 Mar 2011 11:18:17 -0700 (PDT) Received: by 10.220.186.139 with HTTP; Tue, 29 Mar 2011 11:18:17 -0700 (PDT) Date: Tue, 29 Mar 2011 11:18:17 -0700 Message-ID: Subject: How do I increase mapper granularity? From: "W.P. McNeill" To: common-user@hadoop.apache.org Cc: Siddharth Agrawal , Andrew Borthwick Content-Type: multipart/alternative; boundary=0016e646a4d09d8558049fa315eb X-Virus-Checked: Checked by ClamAV on apache.org --0016e646a4d09d8558049fa315eb Content-Type: text/plain; charset=UTF-8 I'm running a job whose mappers take a long time, which causes problems like starving out other jobs that want to run on the same cluster. Rewriting the mapper algorithm is not currently an option, but I still need a way to increase the number of mappers so that I will have greater granularity. What is the best way to do this? Looking through the O'Reilly book and starting from thisWiki page I've come up with a couple of ideas: 1. Set mapred.map.tasks to the value I want. 2. Decrease the block size of my input files. What are the gotchas with these approaches? I know that (1) may not work because this parameter is just a suggestion. Is there a command line option that accomplishes (2), or do I have to do a distcp with a non-default block size. (I think the answer is that I have to do a distcp, but I'm making sure.) Are there other approaches? Are there other gotchas that come with trying to increase mapper granularity. I know this can be more of an art than a science. Thanks. --0016e646a4d09d8558049fa315eb--