Return-Path: X-Original-To: apmail-hadoop-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 1B1B011F0F for ; Thu, 11 Sep 2014 16:36:21 +0000 (UTC) Received: (qmail 21746 invoked by uid 500); 11 Sep 2014 16:36:16 -0000 Delivered-To: apmail-hadoop-user-archive@hadoop.apache.org Received: (qmail 21626 invoked by uid 500); 11 Sep 2014 16:36:16 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 21613 invoked by uid 99); 11 Sep 2014 16:36:15 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 11 Sep 2014 16:36:15 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of stransky.ja@gmail.com designates 209.85.215.51 as permitted sender) Received: from [209.85.215.51] (HELO mail-la0-f51.google.com) (209.85.215.51) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 11 Sep 2014 16:35:49 +0000 Received: by mail-la0-f51.google.com with SMTP id gi9so11296167lab.10 for ; Thu, 11 Sep 2014 09:35:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=vLX7vhaRhT/xzJr2DHxJR4D2xNn++d6fLbfYqipoGk0=; b=tvRwmU9oDg4cTPAa08HKLdvoRsgPyIpooZmNwuIxD61dJQU0Wvke80WwvzO/Inrf5M rdWPpCFiXsBbuZz87GNd1be/PCX6Z4vWSpvaW2ib5RLJ2tHdaUZDYLDnbXhzjt1ZF9Vs 3XFp2dw+22LE+krpeFp9iwCX6M+ry1tA06ouZnnu6HqUUPMBhz5Fd1eqMK/YIbErLjWm oZzsIksLu2SqNoPKxA0CacDnnDVDxVJde0zpUpsKdgCvOHtksbOFVjOXL754axAvn1Ub +YmSMgMoidIUmRbFwgW25iJCpES8/P0Xqly1mCr99nCb1cOgw2BvwQaNCMG8Ljt5t0w5 mz6Q== MIME-Version: 1.0 X-Received: by 10.152.23.6 with SMTP id i6mr2476883laf.39.1410453341074; Thu, 11 Sep 2014 09:35:41 -0700 (PDT) Received: by 10.114.198.131 with HTTP; Thu, 11 Sep 2014 09:35:41 -0700 (PDT) Date: Thu, 11 Sep 2014 18:35:41 +0200 Message-ID: Subject: task slowness From: Jakub Stransky To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=089e0160bbee5df91c0502ccc209 X-Virus-Checked: Checked by ClamAV on apache.org --089e0160bbee5df91c0502ccc209 Content-Type: text/plain; charset=UTF-8 Hello experienced hadoop users, I am having a data pipeline consisting of two java MR jobs coordinated by oozie scheduler. Both of them process the same data but the first one is more than 10 times slower than second one. Job counters on RM page are not much helpful in that matter. I have verified from our monitoring system that there were no constraints on hw like IO, CPU, network etc. Specifically it was using just a fraction of allowed resources designated to given container. Is there a way to get some profiling statistics out of hadoop cluster task? What are the best available tools, required settings etc. I have read a Hadoop definitive guide - job tunning but not sure that those settings are still valid for hadoop 2.2.0. Could someone refer to some good resource where to look for informatio e.g. blog, manual, book etc.. I am a bit confused what refers to hadoop 1 and what's are the settings for hadoop 2 mr 2. Dataset size is around 500MB compressed, and it is map only task Thanks for any experience shared Jakub -- --089e0160bbee5df91c0502ccc209 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Hello experienced hadoop users,

I am ha= ving a data pipeline consisting of two java MR jobs coordinated by oozie sc= heduler. Both of them process the same data but the first one is more than = 10 times slower than second one. Job counters on RM page are not much helpf= ul in that matter. I have verified from our monitoring system that there we= re no constraints on hw like IO, CPU, network etc. Specifically it was usin= g just a fraction of allowed resources designated to given container.
=

Is there a way to get some profiling statistics out of = hadoop cluster task? What are the best available tools, required settings e= tc.

I have read a Hadoop definitive guide - job tu= nning but not sure that those settings are still valid for hadoop 2.2.0.=C2= =A0

Could someone refer to some good resource wher= e to look for informatio e.g. blog, manual, book etc.. I am a bit confused = what refers to hadoop 1 and what's are the settings for hadoop 2 mr 2.<= /div>

Dataset size is around 500MB compressed, and it is= map only task

Thanks for any experience shared
Jakub=C2=A0

--

--089e0160bbee5df91c0502ccc209--