Return-Path: X-Original-To: apmail-hadoop-common-user-archive@www.apache.org Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id C33119A4D for ; Fri, 5 Oct 2012 04:21:52 +0000 (UTC) Received: (qmail 84753 invoked by uid 500); 5 Oct 2012 04:21:48 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 84528 invoked by uid 500); 5 Oct 2012 04:21:47 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 84507 invoked by uid 99); 5 Oct 2012 04:21:47 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 05 Oct 2012 04:21:47 +0000 X-ASF-Spam-Status: No, hits=-0.1 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of hemanty@thoughtworks.com designates 74.125.149.153 as permitted sender) Received: from [74.125.149.153] (HELO na3sys009aog125.obsmtp.com) (74.125.149.153) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 05 Oct 2012 04:21:37 +0000 Received: from mail-vc0-f176.google.com ([209.85.220.176]) (using TLSv1) by na3sys009aob125.postini.com ([74.125.148.12]) with SMTP ID DSNKUG5gPFq75kPWZzP/YlaGPxWEgYtg9hZN@postini.com; Thu, 04 Oct 2012 21:21:17 PDT Received: by mail-vc0-f176.google.com with SMTP id gb22so1849807vcb.35 for ; Thu, 04 Oct 2012 21:21:15 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:x-gm-message-state; bh=Wy3Z1zUp0FLq1coGtBELKKTr4OFVppAP6q7w5Gkf+K8=; b=oOhwUDEQKOtK0/LLVVvaBLd3C7shBTf5oIIJWMb9Fi9yOBXej45XKEhWV5h3P40Dfj Ez0JteAKh1UOT09Qb11DXkqKmT965FID6ZE6zgNeDwNAtH1NYv2r5C/k0CaTjE+K5k/Z rIy4LBucnmKi0f8x7ZS3Lqv9OZ3ZAjrqkpzD95/YQApkkbMcJAtN4/KYyzeUzZME6LWr YsIVwLU9FgZCCXZuh/ZaYqnXvCGQwTcuAk/P5ZXgWFTZ1fa2zcWr7X/eHUfE84riEMxa nNkIzTRifaCyld86aNNRYnx3XpEUGYcLbFmmsgZE2uGWvwDB3W+Agpyh+yzpSN6b5cOP 5J9g== MIME-Version: 1.0 Received: by 10.52.99.234 with SMTP id et10mr3598298vdb.18.1349410875593; Thu, 04 Oct 2012 21:21:15 -0700 (PDT) Received: by 10.58.203.169 with HTTP; Thu, 4 Oct 2012 21:21:15 -0700 (PDT) In-Reply-To: <5D4FB99D-E56C-4C59-B2ED-D12B4128CE14@gmail.com> References: <5D4FB99D-E56C-4C59-B2ED-D12B4128CE14@gmail.com> Date: Fri, 5 Oct 2012 09:51:15 +0530 Message-ID: Subject: Re: Question about how to find which file takes the longest time to process and how to assign more mappers to process that particular file From: Hemanth Yamijala To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=20cf307f37b4e5520904cb4833fc X-Gm-Message-State: ALoCoQk8OjMorCIqXiDF0UPbNLFOY49m5U2jjhBJ92bmeiC1OsslNSFoWmL1pDUNtP03VEGXa9B6 --20cf307f37b4e5520904cb4833fc Content-Type: text/plain; charset=ISO-8859-1 Hi, Roughly, this information will be available under the 'Hadoop map task list' page in the Mapreduce web ui (in Hadoop-1.0, which I am assuming is what you are using). You can reach this page by selecting the running tasks link from the job information page. The page has a table that lists all the tasks and under the status column tells you which part of the input is being processed. Please note that, depending on the input format chosen, a task may be processing a *part* of a file, and not necessary a file itself. Another good source of information to see why these particular tasks are slow will be to look at the job's counters. Again these counters can be accessed from the web ui of the task list page. It would help more if you can provide more information - like what job you're trying to run, the input format specified etc. Thanks hemanth On Fri, Oct 5, 2012 at 3:33 AM, Huanchen Zhang wrote: > Hello, > > I have a question about how to find which file takes the longest time to > process and how to assign more mappers to process that particular file. > > Currently, about three mapper takes about five times more time to > complete. So, how can I detect which specific files are those three mapper > are processing? If above if doable, how can I assign more mappers to > process those specific files? > > Thank you ! > > Best, > Huanchen --20cf307f37b4e5520904cb4833fc Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Hi,

Roughly, this information will be available under th= e 'Hadoop map task list' page in the Mapreduce web ui (in Hadoop-1.= 0, which I am assuming is what you are using). You can reach this page by s= electing the running tasks link from the job information page. The page has= a table that lists all the tasks and under the status column tells you whi= ch part of the input is being processed. Please note that, depending on the= input format chosen, a task may be processing a *part* of a file, and not = necessary a file itself.

Another good source of information to see why these par= ticular tasks are slow will be to look at the job's counters. Again the= se counters can be accessed from the web ui of the task list page.