hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Esteban Gutierrez <este...@cloudera.com>
Subject Re: MR against snapshot causes High CPU usage on Datanodes
Date Wed, 13 May 2015 18:36:20 GMT
rahul,

You might want to look into your MR counters too, if your tasks are
spilling too much to disk or the shuffling phase is too large that might
cause lots of contention. Also you might want to look into the OS/drives
settings too (write cache off or irqbalance off) as Michael said, CPU might
not be a bad thing but depends on what the CPU cycles are being used, e.g.
user times vs system times.

cheers,
esteban.


--
Cloudera, Inc.


On Wed, May 13, 2015 at 10:52 AM, Michael Segel <michael_segel@hotmail.com>
wrote:

> So …
>
> First, you’re wasting money on 10K drives. But that could be your
> company’s standard.
>
> Yes, you’re going to see red.
>
> 24 / 12 , so is that 12 physical cores  or 24 physical cores?
>
> I suspect those are dual chipped w 6 physical cores per chip.
> That’s 12 cores to 12 disks, which is ok.
>
> The 40 or 20 cores to 12 drives… that’s going to cause you trouble.
>
> Note: Seeing high levels of CPU may not be a bad thing.
>
> 7-8 mappers per node?  Not a lot of work for the number of cores…
>
>
>
> > On May 13, 2015, at 12:31 PM, rahul malviya <malviyarahul2001@gmail.com>
> wrote:
> >
> > *How many mapper/reducers are running per node for this job?*
> > I am running 7-8 mappers per node. The spike is seen in mapper phase so
> no
> > reducers where running at that point of time.
> >
> > *Also how many mappers are running as data local mappers?*
> > How to determine this ?
> >
> >
> > * You load/data equally distributed?*
> > Yes as we use presplit hash keys in our hbase cluster and data is pretty
> > evenly distributed.
> >
> > Thanks,
> > Rahul
> >
> >
> > On Wed, May 13, 2015 at 10:25 AM, Anil Gupta <anilgupta84@gmail.com>
> wrote:
> >
> >> How many mapper/reducers are running per node for this job?
> >> Also how many mappers are running as data local mappers?
> >> You load/data equally distributed?
> >>
> >> Your disk, cpu ratio looks ok.
> >>
> >> Sent from my iPhone
> >>
> >>> On May 13, 2015, at 10:12 AM, rahul malviya <
> malviyarahul2001@gmail.com>
> >> wrote:
> >>>
> >>> *The High CPU may be WAIT IOs,  which would mean that you’re cpu is
> >> waiting
> >>> for reads from the local disks.*
> >>>
> >>> Yes I think thats what is going on but I am trying to understand why it
> >>> happens only in case of snapshot MR but if I run the same job without
> >> using
> >>> snapshot everything is normal. What is the difference in snapshot
> version
> >>> which can cause such a spike ? I looking through the code for snapshot
> >>> version if I can find something.
> >>>
> >>> cores / disks == 24 / 12 or 40 / 12.
> >>>
> >>> We are using 10K sata drives on our datanodes.
> >>>
> >>> Rahul
> >>>
> >>> On Wed, May 13, 2015 at 10:00 AM, Michael Segel <
> >> michael_segel@hotmail.com>
> >>> wrote:
> >>>
> >>>> Without knowing your exact configuration…
> >>>>
> >>>> The High CPU may be WAIT IOs,  which would mean that you’re cpu is
> >> waiting
> >>>> for reads from the local disks.
> >>>>
> >>>> What’s the ratio of cores (physical) to disks?
> >>>> What type of disks are you using?
> >>>>
> >>>> That’s going to be the most likely culprit.
> >>>>>> On May 13, 2015, at 11:41 AM, rahul malviya <
> >> malviyarahul2001@gmail.com>
> >>>>> wrote:
> >>>>>
> >>>>> Yes.
> >>>>>
> >>>>>> On Wed, May 13, 2015 at 9:40 AM, Ted Yu <yuzhihong@gmail.com>
> wrote:
> >>>>>>
> >>>>>> Have you enabled short circuit read ?
> >>>>>>
> >>>>>> Cheers
> >>>>>>
> >>>>>> On Wed, May 13, 2015 at 9:37 AM, rahul malviya <
> >>>> malviyarahul2001@gmail.com
> >>>>>> wrote:
> >>>>>>
> >>>>>>> Hi,
> >>>>>>>
> >>>>>>> I have recently started running MR on hbase snapshots but
when the
> MR
> >>>> is
> >>>>>>> running there is pretty high CPU usage on datanodes and
I start
> >> seeing
> >>>> IO
> >>>>>>> wait message in datanode logs and as soon I kill the MR
on Snapshot
> >>>>>>> everything come back to normal.
> >>>>>>>
> >>>>>>> What could be causing this ?
> >>>>>>>
> >>>>>>> I am running cdh5.2.0 distribution.
> >>>>>>>
> >>>>>>> Thanks,
> >>>>>>> Rahul
> >>>>
> >>>>
> >>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message