Mailing-List: contact user-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hbase.apache.org
MIME-Version: 1.0
In-Reply-To: 
 <CAGM+STGVdjTLD9N0W6w+2NxigKuNnebAaZ4GG8duhPJa-f51mw@mail.gmail.com>
References: 
 <CAGM+STEWcjtnvyKoxHFn5C9kLgNcAf3uB=pM0uEetnJj+EdCOg@mail.gmail.com>
 <CALte62w0X1ecchy5hSHZxyh1dywbm3zOJZvVKyhR+XPpPeX4dw@mail.gmail.com>
 <CAGM+STFOrVZUg52L_L6Oxa-nQr7tMxyva_MCCbb0AqaBn8mHYQ@mail.gmail.com>
 <BLU436-SMTP36DA520779C7E2F89188688FD90@phx.gbl>
 <CAGM+STGyc9y+UV5UghCAsWmg61Lk_dAHXi01_SRqJ6EHfHYNtA@mail.gmail.com>
 <3FB0728B-341D-4D9F-BD0A-ED434560E23E@gmail.com>
 <CAGM+STGVdjTLD9N0W6w+2NxigKuNnebAaZ4GG8duhPJa-f51mw@mail.gmail.com>
From: anil gupta <anilgupta84@gmail.com>
Date: Wed, 13 May 2015 13:06:12 -0700
Message-ID: 
 <CAF1+Vs8WEUF=QyPjGTu63CVv8HaY71KPKj1Ou1bA62pjnSV1Ow@mail.gmail.com>
Subject: Re: MR against snapshot causes High CPU usage on Datanodes
To: "user@hbase.apache.org" <user@hbase.apache.org>
Content-Type: multipart/alternative; boundary=089e0122a26cc4fea40515fc254f

--089e0122a26cc4fea40515fc254f
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Inline.

On Wed, May 13, 2015 at 10:31 AM, rahul malviya <malviyarahul2001@gmail.com=
>
wrote:

> *How many mapper/reducers are running per node for this job?*
> I am running 7-8 mappers per node. The spike is seen in mapper phase so n=
o
> reducers where running at that point of time.
>
> *Also how many mappers are running as data local mappers?*
> How to determine this ?
>
On the counter web page of your job. Look for "Data-local map tasks"
counter.

>
>
> * You load/data equally distributed?*
> Yes as we use presplit hash keys in our hbase cluster and data is pretty
> evenly distributed.
>
> Thanks,
> Rahul
>
>
> On Wed, May 13, 2015 at 10:25 AM, Anil Gupta <anilgupta84@gmail.com>
> wrote:
>
> > How many mapper/reducers are running per node for this job?
> > Also how many mappers are running as data local mappers?
> > You load/data equally distributed?
> >
> > Your disk, cpu ratio looks ok.
> >
> > Sent from my iPhone
> >
> > > On May 13, 2015, at 10:12 AM, rahul malviya <
> malviyarahul2001@gmail.com>
> > wrote:
> > >
> > > *The High CPU may be WAIT IOs,  which would mean that you=E2=80=99re =
cpu is
> > waiting
> > > for reads from the local disks.*
> > >
> > > Yes I think thats what is going on but I am trying to understand why =
it
> > > happens only in case of snapshot MR but if I run the same job without
> > using
> > > snapshot everything is normal. What is the difference in snapshot
> version
> > > which can cause such a spike ? I looking through the code for snapsho=
t
> > > version if I can find something.
> > >
> > > cores / disks =3D=3D 24 / 12 or 40 / 12.
> > >
> > > We are using 10K sata drives on our datanodes.
> > >
> > > Rahul
> > >
> > > On Wed, May 13, 2015 at 10:00 AM, Michael Segel <
> > michael_segel@hotmail.com>
> > > wrote:
> > >
> > >> Without knowing your exact configuration=E2=80=A6
> > >>
> > >> The High CPU may be WAIT IOs,  which would mean that you=E2=80=99re =
cpu is
> > waiting
> > >> for reads from the local disks.
> > >>
> > >> What=E2=80=99s the ratio of cores (physical) to disks?
> > >> What type of disks are you using?
> > >>
> > >> That=E2=80=99s going to be the most likely culprit.
> > >>>> On May 13, 2015, at 11:41 AM, rahul malviya <
> > malviyarahul2001@gmail.com>
> > >>> wrote:
> > >>>
> > >>> Yes.
> > >>>
> > >>>> On Wed, May 13, 2015 at 9:40 AM, Ted Yu <yuzhihong@gmail.com>
> wrote:
> > >>>>
> > >>>> Have you enabled short circuit read ?
> > >>>>
> > >>>> Cheers
> > >>>>
> > >>>> On Wed, May 13, 2015 at 9:37 AM, rahul malviya <
> > >> malviyarahul2001@gmail.com
> > >>>> wrote:
> > >>>>
> > >>>>> Hi,
> > >>>>>
> > >>>>> I have recently started running MR on hbase snapshots but when th=
e
> MR
> > >> is
> > >>>>> running there is pretty high CPU usage on datanodes and I start
> > seeing
> > >> IO
> > >>>>> wait message in datanode logs and as soon I kill the MR on Snapsh=
ot
> > >>>>> everything come back to normal.
> > >>>>>
> > >>>>> What could be causing this ?
> > >>>>>
> > >>>>> I am running cdh5.2.0 distribution.
> > >>>>>
> > >>>>> Thanks,
> > >>>>> Rahul
> > >>
> > >>
> >
>


--=20
Thanks & Regards,
Anil Gupta

--089e0122a26cc4fea40515fc254f--