hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matteo Bertozzi <theo.berto...@gmail.com>
Subject Re: Has anyone tried profiling trunk recently?
Date Fri, 06 Nov 2015 07:25:06 GMT
do you know if that is just a burst on startup while the SSH is running, or
it is going on forever?
in theory it should be just on startup when the SSH is running, that is by
implementation and I'm working on completing the procedure scheduler to
remove that spin on startup.

if it is going on forever that is a bug, that i'm still unable to
reproduce.


On Thu, Nov 5, 2015 at 10:45 PM, Andrew Purtell <apurtell@apache.org> wrote:

> It was definitely something to do with SSH. I was able to reproduce, then
> ran through this again but manually cleared away all files in WALs/* after
> shutdown, and then the profile was what I'd expect.
>
>
> On Thu, Nov 5, 2015 at 8:41 PM, Andrew Purtell <apurtell@apache.org>
> wrote:
>
> >
> >
> > On Thu, Nov 5, 2015 at 12:18 PM, Matteo Bertozzi <
> theo.bertozzi@gmail.com>
> > wrote:
> >
> >> what is the shutdown and restart? a clean stop? kill?
> >>
> >
> > ​The shutdown of a ^C of the minicluster as launched out of the dev tree.
> > The termination looked normal to be but I did see the SSH handler kick in
> > upon restart.​
> >
> > is the second YCSB job run on the cloned table?
> >>
> >
> > ​Yes​
> >
> >
> >
> >>
> >> the only master operation with procedures that I see here are
> >>  - create meta & namespace table
> >>  - create the table for ycsb
> >>  - disable the table for snapshot
> >>  - maybe enable for the table after restart?
> >>
> >> so, 5 procedure running let say 10 step each (exaggerating) you'll have
> 50
> >> calls to poll()
> >> then we have the SSH procedure, how many server does the cluster have?
> >>
> >
> > ​The minicluster in master embeds a zk quorum peer, the master, and two
> > regionservers.​
> >
> >
> >
> >> maybe we keep spinning on the server actions after restart...
> >>
> >>
> > ​That was a thought I had as well, spinning on poll of the master
> > procedure queue​ after restart.
> >
> >
> >
> >>
> >>
> >> Matteo
> >>
> >>
> >> On Thu, Nov 5, 2015 at 11:47 AM, Andrew Purtell <
> andrew.purtell@gmail.com
> >> >
> >> wrote:
> >>
> >> > Here's what I did:
> >> >
> >> > 0. Launch minicluster out of dev tree with ./bin/hbase master start
> >> > 1. Load 10M rows using YCSB
> >> > 2. Snapshot
> >> > 3. Shutdown and restart
> >> > 4. Clone a test table from the snapshot
> >> > 5. Run YCSB workload A for 2M operations
> >> >
> >> > The profile was captured at step 5.
> >> >
> >> > HTH
> >> >
> >> >
> >> > > On Nov 5, 2015, at 10:40 AM, Matteo Bertozzi <
> theo.bertozzi@gmail.com
> >> >
> >> > wrote:
> >> > >
> >> > > what operation do you have going on the HMaster?
> >> > > I did a simple test where there are no operation going on and poll
> >> > doesn't
> >> > > get out of the waitCond.await()
> >> > > so unless you have operations on the master you'll never call
> >> > queue.poll()
> >> > >
> >> > > or maybe, there's a bug and the previous ops where hanging and
> >> trigger a
> >> > > loop.
> >> > > what did you executed on the master before?
> >> > > do you have a script or something that I can use to reproduce this?
> >> > >
> >> > > Matteo
> >> > >
> >> > >
> >> > > On Thu, Nov 5, 2015 at 10:25 AM, Matteo Bertozzi <
> >> > theo.bertozzi@gmail.com>
> >> > > wrote:
> >> > >
> >> > >> let me take a look, but branch-1 and master should be the same
on
> the
> >> > proc
> >> > >> stuff
> >> > >>
> >> > >> Matteo
> >> > >>
> >> > >>
> >> > >>> On Thu, Nov 5, 2015 at 10:23 AM, Stack <stack@duboce.net>
wrote:
> >> > >>>
> >> > >>> You are looking at master process only?
> >> > >>>
> >> > >>> I've been looking at branch-1 only of late.
> >> > >>>
> >> > >>> St.Ack
> >> > >>>
> >> > >>> On Thu, Nov 5, 2015 at 12:06 AM, Andrew Purtell <
> >> apurtell@apache.org>
> >> > >>> wrote:
> >> > >>>
> >> > >>>> I just profiled trunk running YCSB workload A. I was looking
for
> >> > >>> something
> >> > >>>> else but found completely unexpected results.
> >> > >>>>
> >> > >>>> I see 60% of all allocation pressure in TLABs coming from
> >> > >>>> ProcedureExecutors, allocating 16 GB of
> >> > >>>> java.util.AbstractMap$SimpleImmutableEntry in this code:
> >> > >>>>
> >> > >>>> Stack Trace TLABs Total TLAB Size(bytes) Pressure(%)
> >> > >>>> java.util.concurrent.ConcurrentSkipListMap$Node.createSnapshot()
> >> > 11,603
> >> > >>>> 18,402,380,120 100
> >> > >>>>   java.util.concurrent.ConcurrentSkipListMap.firstEntry()
11,599
> >> > >>>> 18,401,678,776 99.996
> >> > >>>>
> >> > >>>>
> >> org.apache.hadoop.hbase.procedure2.ProcedureFairRunQueues.nextObject()
> >> > >>>> 11,599 18,401,678,776 99.996
> >> > >>> org.apache.hadoop.hbase.procedure2.ProcedureFairRunQueues.poll()
> >> > >>>> 11,599 18,401,678,776 99.996
> >> > >>>>
> >> > >>>>
> >> org.apache.hadoop.hbase.master.procedure.MasterProcedureQueue.poll()
> >> > >>> 11,599
> >> > >>>> 18,401,678,776 99.996
> >> > >>>>
> >> > >>>> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execLoop()
> >> 11,599
> >> > >>>> 18,401,678,776 99.996
> >> > >>>>
> >> > >>>>
> >> > >>>> The three ProcedureExecutor threads also consume about
60% of all
> >> CPU
> >> > >>> time,
> >> > >>>> each about 20%. Here are the top hot code arcs from one
of them:
> >> > >>>>
> >> > >>>> Stack Trace Sample Count Percentage(%)
> >> > >>>> org.apache.hadoop.hbase.procedure2.ProcedureFairRunQueues.poll()
> >> 1,521
> >> > >>>> 39.191
> >> > >>>>
> >>  org.apache.hadoop.hbase.master.procedure.MasterProcedureQueue.poll()
> >> > >>>> 1,521 39.191
> >> > >>>>
> >> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execLoop()
> >> > >>> 1,521
> >> > >>>> 39.191
> >> > >>>> java.lang.String$CaseInsensitiveComparator.compare(String,
> String)
> >> 614
> >> > >>>> 15.821
> >> > >>>>   java.lang.String$CaseInsensitiveComparator.compare(Object,
> >> Object)
> >> > >>> 614
> >> > >>>> 15.821
> >> > >>>>      java.lang.String.compareToIgnoreCase(String) 614
15.821
> >> > >>>>         org.apache.hadoop.hbase.ServerName.compareTo(ServerName)
> >> 614
> >> > >>>> 15.821
> >> > >>>>            org.apache.hadoop.hbase.ServerName.compareTo(Object)
> 614
> >> > >>> 15.821
> >> > >>> java.util.concurrent.ConcurrentSkipListMap.cpr(Comparator,
> >> > >>>> Object, Object) 614 15.821
> >> > >>>>
> >> > >>>> java.util.concurrent.ConcurrentSkipListMap.findNear(Object,
int,
> >> > >>>> Comparator)
> >> > >>>> 614 15.821
> >> > >>>>
> >> > >>>> java.util.concurrent.ConcurrentSkipListMap.getNear(Object,
int)
> 614
> >> > >>> 15.821
> >> > >>>>
> >> > >>>> java.util.concurrent.ConcurrentSkipListMap.higherEntry(Object)
> 614
> >> > >>> 15.821
> >> > >>>>
> >> > >>>>
> >> org.apache.hadoop.hbase.procedure2.ProcedureFairRunQueues.nextObject()
> >> > >>> 614
> >> > >>>> 15.821
> >> > >>>>
> >> > >>>> org.apache.hadoop.hbase.procedure2.ProcedureFairRunQueues.poll()
> >> 614
> >> > >>> 15.821
> >> > >>>>
> >> > >>>>
> >> org.apache.hadoop.hbase.master.procedure.MasterProcedureQueue.poll()
> >> > >>> 614
> >> > >>>> 15.821
> >> > >>>>
> >> > >>>> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execLoop()
> 614
> >> > >>> 15.821
> >> > >>>> java.util.concurrent.ConcurrentSkipListMap$Node.createSnapshot()
> >> 468
> >> > >>> 12.059
> >> > >>>>   java.util.concurrent.ConcurrentSkipListMap.getNear(Object,
int)
> >> 468
> >> > >>>> 12.059
> >> > >>>>
> java.util.concurrent.ConcurrentSkipListMap.higherEntry(Object)
> >> > 468
> >> > >>>> 12.059
> >> > >>>>
> >> > >>>>
> >> org.apache.hadoop.hbase.procedure2.ProcedureFairRunQueues.nextObject()
> >> > >>> 468
> >> > >>>> 12.059
> >> > >>>>
> >> > >>>> org.apache.hadoop.hbase.procedure2.ProcedureFairRunQueues.poll()
> >> > >>>> 468 12.059
> >> > >>>>
> >> > >>>>
> >> org.apache.hadoop.hbase.master.procedure.MasterProcedureQueue.poll()
> >> > >>> 468
> >> > >>>> 12.059
> >> > >>>>
> >> > >>>> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execLoop()
> 468
> >> > >>> 12.059
> >> > >>>> java.util.concurrent.ConcurrentSkipListMap.cpr(Comparator,
> Object,
> >> > >>> Object)
> >> > >>>> 402 10.358
> >> > >>>>   java.util.concurrent.ConcurrentSkipListMap.findNear(Object,
> int,
> >> > >>>> Comparator) 280 7.215
> >> > >>>>      java.util.concurrent.ConcurrentSkipListMap.getNear(Object,
> >> int)
> >> > >>> 280
> >> > >>>> 7.215
> >> > >>>>
> >>  java.util.concurrent.ConcurrentSkipListMap.higherEntry(Object)
> >> > >>> 280
> >> > >>>> 7.215
> >> > >>>>
> >> > >>>>
> >> org.apache.hadoop.hbase.procedure2.ProcedureFairRunQueues.nextObject()
> >> > >>> 280
> >> > >>>> 7.215
> >> > >>>>
> >> > >>>> org.apache.hadoop.hbase.procedure2.ProcedureFairRunQueues.poll()
> >> 280
> >> > >>> 7.215
> >> > >>>>
> >> > >>>>
> >> org.apache.hadoop.hbase.master.procedure.MasterProcedureQueue.poll()
> >> > 280
> >> > >>>> 7.215
> >> > >>>>
> >> > >>>> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execLoop()
> 280
> >> > >>> 7.215
> >> > >>>>
> >>  java.util.concurrent.ConcurrentSkipListMap.findPredecessor(Object,
> >> > >>>> Comparator) 122 3.144
> >> > >>>>      java.util.concurrent.ConcurrentSkipListMap.findNear(Object,
> >> int,
> >> > >>>> Comparator) 122 3.144
> >> > >>>>
>  java.util.concurrent.ConcurrentSkipListMap.getNear(Object,
> >> > int)
> >> > >>>> 122
> >> > >>>> 3.144
> >> > >>> java.util.concurrent.ConcurrentSkipListMap.higherEntry(Object)
> >> > >>>> 122 3.144
> >> > >>>>
> >> > >>>>
> >> org.apache.hadoop.hbase.procedure2.ProcedureFairRunQueues.nextObject()
> >> > >>> 122
> >> > >>>> 3.144
> >> > >>>>
> >> > >>>> org.apache.hadoop.hbase.procedure2.ProcedureFairRunQueues.poll()
> >> 122
> >> > >>> 3.144
> >> > >>>>
> >> > >>>>
> >> org.apache.hadoop.hbase.master.procedure.MasterProcedureQueue.poll()
> >> > >>> 122
> >> > >>>> 3.144
> >> > >>>>
> >> > >>>> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execLoop()
> 122
> >> > >>> 3.144
> >> > >>>>
> >> > >>>>
> >> > >>>> ​Just me? Someone working on ProcedureV2 should have
a look at
> WTH
> >> the
> >> > >>>> ProcedureExecutors are doing. They seem to be polling
the run
> queue
> >> > >>>> constantly and burning up CPU. I'll try running more workloads
at
> >> the
> >> > >>> next
> >> > >>>> opportunity and see if I can repeat this or gain more
insight.
> >> > >>>>
> >> > >>>>
> >> > >>>> --
> >> > >>>> Best regards,
> >> > >>>>
> >> > >>>>   - Andy
> >> > >>>>
> >> > >>>> Problems worthy of attack prove their worth by hitting
back. -
> Piet
> >> > Hein
> >> > >>>> (via Tom White)
> >> > >>
> >> > >>
> >> >
> >>
> >
> >
> >
> > --
> > Best regards,
> >
> >    - Andy
> >
> > Problems worthy of attack prove their worth by hitting back. - Piet Hein
> > (via Tom White)
> >
>
>
>
> --
> Best regards,
>
>    - Andy
>
> Problems worthy of attack prove their worth by hitting back. - Piet Hein
> (via Tom White)
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message