Return-Path: X-Original-To: apmail-hadoop-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 2CB42F000 for ; Thu, 6 Dec 2012 14:49:25 +0000 (UTC) Received: (qmail 76544 invoked by uid 500); 6 Dec 2012 14:49:16 -0000 Delivered-To: apmail-hadoop-user-archive@hadoop.apache.org Received: (qmail 76162 invoked by uid 500); 6 Dec 2012 14:49:15 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 73696 invoked by uid 99); 6 Dec 2012 14:49:12 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 06 Dec 2012 14:49:12 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of harsh@cloudera.com designates 209.85.210.172 as permitted sender) Received: from [209.85.210.172] (HELO mail-ia0-f172.google.com) (209.85.210.172) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 06 Dec 2012 14:49:06 +0000 Received: by mail-ia0-f172.google.com with SMTP id z13so5831116iaz.17 for ; Thu, 06 Dec 2012 06:48:45 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type:x-gm-message-state; bh=S07+OlJzVDxlTaNz7a2MPqsjr1UneU2D7OvO3ch4GKw=; b=kF8hIR2mmHJ3V9rLx+LbAZ39h0jn0h9tRrsog2qayBoWj1GIq4yFy/YCq0QA2/u0yi qjXJ0fOaLiyiuTiz8bm2/3XWvIRJGNUFOVHrroV6A65eCAySgnQ1+t3ZsNndQVKhyG/s VJLq8zKITumG9oY/2jQHKkTmi97gRDZB2oDZhjQU23HacLc+exFrA7Uyl3gzCu7+oAfc A8A23izv9fBWwmC6ldTPY7P3pj9qQtqSt8NrzShop62l8XRWra1za/u9EVnXjf3XqrgJ XSfnK3o9V1+ypbF0t13Bt1UUuTf46l6q9hDPB8UgSyVWqp+9kjPx7+dh3anCf/PfOdjA pfkA== Received: by 10.50.194.196 with SMTP id hy4mr1614848igc.52.1354805324844; Thu, 06 Dec 2012 06:48:44 -0800 (PST) MIME-Version: 1.0 Received: by 10.64.6.129 with HTTP; Thu, 6 Dec 2012 06:48:24 -0800 (PST) In-Reply-To: References: From: Harsh J Date: Thu, 6 Dec 2012 20:18:24 +0530 Message-ID: Subject: Re: M/R, Strange behavior with multiple Gzip files To: "" Content-Type: text/plain; charset=ISO-8859-1 X-Gm-Message-State: ALoCoQlWsOtWY8tbGr6FX5WC8MZUacQFLRUjNFpgLo2ydBbVWtpGwxCpjgdVz4fDjuqDouWr0ZCl X-Virus-Checked: Checked by ClamAV on apache.org I tend to agree with Jean-Marc's observation. If your job client logs a "LocalJobRunner" at any point, then that is most definitely your problem. Otherwise, if you feel you are facing a scheduling problem, then it may most likely be your scheduler configuration. For example, FairScheduler has a attribute over its pools that you can set to control maximum parallel use of slots for jobs using that pool, etc.. On Thu, Dec 6, 2012 at 8:10 PM, x6i4uybz labs wrote: > Hello, > > The job isn't running in local mode. In fact, I think I have just a problem > with the map task progression. > The counters of each map task are OK during the job execution whereas the > progression of each map task stays at 0%. > > > > On Thu, Dec 6, 2012 at 1:34 PM, Jean-Marc Spaggiari > wrote: >> >> Hi, >> >> Have you configured the mapredsite.xml to tell where the job tracker >> is? If not, your job is running on the local jobtracker, running the >> tasks one by one. >> >> JM >> >> PS: I faced the same issue few weeks ago and got the exact same >> behaviour. This (above) solved the issue. >> >> 2012/12/6, x6i4uybz labs : >> > Sorry, >> > >> > I wrote a job M/R to process several gz files (about 2000). I've a 80 >> > map >> > slots cluster >> > JT instantiates one map per gz file (not splittable, it's OK). >> > >> > The first 80 maps spawn. But after "initializing" state, it seems there >> > is >> > one map running. And when this map is finished, another one started (not >> > 80 >> > maps in parallel) and another is affected to the empty slot. >> > >> > I've also noticed, the first maps last more than one hour and the last >> > maps >> > 50 seconds. >> > Each gz file is between 10mb and 100mb. >> > >> > I don't understand the behavior. >> > I will launch again the job to see if I've the same issue. >> > >> > thanks, gpo >> > >> > >> > >> > >> > >> > >> > >> > >> > On Wed, Dec 5, 2012 at 6:33 PM, Harsh J wrote: >> > >> >> Your problem isn't clear in your description - can you please >> >> rephrase/redefine in terms of what you are expecting vs. what you are >> >> observing. >> >> >> >> Also note that Gzip files are not splittable by nature of their codec >> >> algorithm, and hence a TextInputFormat over plain/regular Gzip files >> >> would end up spawning and/or processing one whole Gzip file via one >> >> mapper, instead of multiple mappers per file. >> >> >> >> On Wed, Dec 5, 2012 at 9:32 PM, x6i4uybz labs >> >> >> >> wrote: >> >> > Hi everybody, >> >> > >> >> > I have a M/R job which does a bulk import to hbase. >> >> > I have to process many gzip files (2800 x ~ 100mb) >> >> > >> >> > I don't understand why my job instanciates 80 maps but runs each map >> >> > sequentialy like if there is only one big gz file. >> >> > >> >> > Is there a problem in my driver ? Or maybe I miss something. >> >> > I use "FileInputFormat.addInputPath(job, new Path(args[0]))" where >> >> args[0] >> >> > is a directory. >> >> > >> >> > Can you help me, please ? >> >> > >> >> > Thanks, Guillaume >> >> >> >> >> >> >> >> -- >> >> Harsh J >> >> >> > > > -- Harsh J