Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 327EF1773D for ; Wed, 13 May 2015 20:06:39 +0000 (UTC) Received: (qmail 26254 invoked by uid 500); 13 May 2015 20:06:37 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 26189 invoked by uid 500); 13 May 2015 20:06:37 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 26173 invoked by uid 99); 13 May 2015 20:06:36 -0000 Received: from Unknown (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 13 May 2015 20:06:36 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 402921A2A87 for ; Wed, 13 May 2015 20:06:36 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 4.148 X-Spam-Level: **** X-Spam-Status: No, score=4.148 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, FREEMAIL_REPLY=1, HTML_MESSAGE=3, RCVD_IN_MSPIKE_H2=-0.001, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-eu-west.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id Mab7_Omlj0lZ for ; Wed, 13 May 2015 20:06:34 +0000 (UTC) Received: from mail-yh0-f48.google.com (mail-yh0-f48.google.com [209.85.213.48]) by mx1-eu-west.apache.org (ASF Mail Server at mx1-eu-west.apache.org) with ESMTPS id 28F902030F for ; Wed, 13 May 2015 20:06:34 +0000 (UTC) Received: by yhda23 with SMTP id a23so17009713yhd.2 for ; Wed, 13 May 2015 13:06:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=PhyROcC8D6spXf36H/UvsAhjAP2TTp7/njL7SZBY6U8=; b=RORd51BzNAnytxpYXpQunpYNTjTiIN8Ohnh769OR0r6cEg9wUS5ZWFRLr3m9jI1JnD 2+idLvTZ0+x3KJOrB2QYYz2UNw+E24vFOc/dgnm0wVQawFi4SkjI9SojshJL0BL9Dorn XGD4w7oBBZKqiD7lDIEzf873eb2LSYXDo688mwZ8rL+IYC5UVSZYpdreMhS/sh3qwXEi e3+O+qBVzGO6d5jXkipxFnlwG/PO8F1hkRv56WpsXm83uses8zi1zbcBITdlLeAmsbIh rVqBb/Z6bakbdkfCB4gkdsCPqFLqzE/LMIBmMrJIHhUTuU8MONAXrpWTD8jtg3KgJnOu k12g== X-Received: by 10.236.24.197 with SMTP id x45mr633424yhx.118.1431547593160; Wed, 13 May 2015 13:06:33 -0700 (PDT) MIME-Version: 1.0 Received: by 10.13.228.4 with HTTP; Wed, 13 May 2015 13:06:12 -0700 (PDT) In-Reply-To: References: <3FB0728B-341D-4D9F-BD0A-ED434560E23E@gmail.com> From: anil gupta Date: Wed, 13 May 2015 13:06:12 -0700 Message-ID: Subject: Re: MR against snapshot causes High CPU usage on Datanodes To: "user@hbase.apache.org" Content-Type: multipart/alternative; boundary=089e0122a26cc4fea40515fc254f --089e0122a26cc4fea40515fc254f Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Inline. On Wed, May 13, 2015 at 10:31 AM, rahul malviya wrote: > *How many mapper/reducers are running per node for this job?* > I am running 7-8 mappers per node. The spike is seen in mapper phase so n= o > reducers where running at that point of time. > > *Also how many mappers are running as data local mappers?* > How to determine this ? > On the counter web page of your job. Look for "Data-local map tasks" counter. > > > * You load/data equally distributed?* > Yes as we use presplit hash keys in our hbase cluster and data is pretty > evenly distributed. > > Thanks, > Rahul > > > On Wed, May 13, 2015 at 10:25 AM, Anil Gupta > wrote: > > > How many mapper/reducers are running per node for this job? > > Also how many mappers are running as data local mappers? > > You load/data equally distributed? > > > > Your disk, cpu ratio looks ok. > > > > Sent from my iPhone > > > > > On May 13, 2015, at 10:12 AM, rahul malviya < > malviyarahul2001@gmail.com> > > wrote: > > > > > > *The High CPU may be WAIT IOs, which would mean that you=E2=80=99re = cpu is > > waiting > > > for reads from the local disks.* > > > > > > Yes I think thats what is going on but I am trying to understand why = it > > > happens only in case of snapshot MR but if I run the same job without > > using > > > snapshot everything is normal. What is the difference in snapshot > version > > > which can cause such a spike ? I looking through the code for snapsho= t > > > version if I can find something. > > > > > > cores / disks =3D=3D 24 / 12 or 40 / 12. > > > > > > We are using 10K sata drives on our datanodes. > > > > > > Rahul > > > > > > On Wed, May 13, 2015 at 10:00 AM, Michael Segel < > > michael_segel@hotmail.com> > > > wrote: > > > > > >> Without knowing your exact configuration=E2=80=A6 > > >> > > >> The High CPU may be WAIT IOs, which would mean that you=E2=80=99re = cpu is > > waiting > > >> for reads from the local disks. > > >> > > >> What=E2=80=99s the ratio of cores (physical) to disks? > > >> What type of disks are you using? > > >> > > >> That=E2=80=99s going to be the most likely culprit. > > >>>> On May 13, 2015, at 11:41 AM, rahul malviya < > > malviyarahul2001@gmail.com> > > >>> wrote: > > >>> > > >>> Yes. > > >>> > > >>>> On Wed, May 13, 2015 at 9:40 AM, Ted Yu > wrote: > > >>>> > > >>>> Have you enabled short circuit read ? > > >>>> > > >>>> Cheers > > >>>> > > >>>> On Wed, May 13, 2015 at 9:37 AM, rahul malviya < > > >> malviyarahul2001@gmail.com > > >>>> wrote: > > >>>> > > >>>>> Hi, > > >>>>> > > >>>>> I have recently started running MR on hbase snapshots but when th= e > MR > > >> is > > >>>>> running there is pretty high CPU usage on datanodes and I start > > seeing > > >> IO > > >>>>> wait message in datanode logs and as soon I kill the MR on Snapsh= ot > > >>>>> everything come back to normal. > > >>>>> > > >>>>> What could be causing this ? > > >>>>> > > >>>>> I am running cdh5.2.0 distribution. > > >>>>> > > >>>>> Thanks, > > >>>>> Rahul > > >> > > >> > > > --=20 Thanks & Regards, Anil Gupta --089e0122a26cc4fea40515fc254f--