hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Purtell <apurt...@apache.org>
Subject Re: Has anyone tried profiling trunk recently?
Date Fri, 06 Nov 2015 04:41:22 GMT
On Thu, Nov 5, 2015 at 12:18 PM, Matteo Bertozzi <theo.bertozzi@gmail.com>
wrote:

> what is the shutdown and restart? a clean stop? kill?
>

​The shutdown of a ^C of the minicluster as launched out of the dev tree.
The termination looked normal to be but I did see the SSH handler kick in
upon restart.​

is the second YCSB job run on the cloned table?
>

​Yes​



>
> the only master operation with procedures that I see here are
>  - create meta & namespace table
>  - create the table for ycsb
>  - disable the table for snapshot
>  - maybe enable for the table after restart?
>
> so, 5 procedure running let say 10 step each (exaggerating) you'll have 50
> calls to poll()
> then we have the SSH procedure, how many server does the cluster have?
>

​The minicluster in master embeds a zk quorum peer, the master, and two
regionservers.​



> maybe we keep spinning on the server actions after restart...
>
>
​That was a thought I had as well, spinning on poll of the master procedure
queue​ after restart.



>
>
> Matteo
>
>
> On Thu, Nov 5, 2015 at 11:47 AM, Andrew Purtell <andrew.purtell@gmail.com>
> wrote:
>
> > Here's what I did:
> >
> > 0. Launch minicluster out of dev tree with ./bin/hbase master start
> > 1. Load 10M rows using YCSB
> > 2. Snapshot
> > 3. Shutdown and restart
> > 4. Clone a test table from the snapshot
> > 5. Run YCSB workload A for 2M operations
> >
> > The profile was captured at step 5.
> >
> > HTH
> >
> >
> > > On Nov 5, 2015, at 10:40 AM, Matteo Bertozzi <theo.bertozzi@gmail.com>
> > wrote:
> > >
> > > what operation do you have going on the HMaster?
> > > I did a simple test where there are no operation going on and poll
> > doesn't
> > > get out of the waitCond.await()
> > > so unless you have operations on the master you'll never call
> > queue.poll()
> > >
> > > or maybe, there's a bug and the previous ops where hanging and trigger
> a
> > > loop.
> > > what did you executed on the master before?
> > > do you have a script or something that I can use to reproduce this?
> > >
> > > Matteo
> > >
> > >
> > > On Thu, Nov 5, 2015 at 10:25 AM, Matteo Bertozzi <
> > theo.bertozzi@gmail.com>
> > > wrote:
> > >
> > >> let me take a look, but branch-1 and master should be the same on the
> > proc
> > >> stuff
> > >>
> > >> Matteo
> > >>
> > >>
> > >>> On Thu, Nov 5, 2015 at 10:23 AM, Stack <stack@duboce.net> wrote:
> > >>>
> > >>> You are looking at master process only?
> > >>>
> > >>> I've been looking at branch-1 only of late.
> > >>>
> > >>> St.Ack
> > >>>
> > >>> On Thu, Nov 5, 2015 at 12:06 AM, Andrew Purtell <apurtell@apache.org
> >
> > >>> wrote:
> > >>>
> > >>>> I just profiled trunk running YCSB workload A. I was looking for
> > >>> something
> > >>>> else but found completely unexpected results.
> > >>>>
> > >>>> I see 60% of all allocation pressure in TLABs coming from
> > >>>> ProcedureExecutors, allocating 16 GB of
> > >>>> java.util.AbstractMap$SimpleImmutableEntry in this code:
> > >>>>
> > >>>> Stack Trace TLABs Total TLAB Size(bytes) Pressure(%)
> > >>>> java.util.concurrent.ConcurrentSkipListMap$Node.createSnapshot()
> > 11,603
> > >>>> 18,402,380,120 100
> > >>>>   java.util.concurrent.ConcurrentSkipListMap.firstEntry() 11,599
> > >>>> 18,401,678,776 99.996
> > >>>>
> > >>>>
> org.apache.hadoop.hbase.procedure2.ProcedureFairRunQueues.nextObject()
> > >>>> 11,599 18,401,678,776 99.996
> > >>> org.apache.hadoop.hbase.procedure2.ProcedureFairRunQueues.poll()
> > >>>> 11,599 18,401,678,776 99.996
> > >>>>
> > >>>> org.apache.hadoop.hbase.master.procedure.MasterProcedureQueue.poll()
> > >>> 11,599
> > >>>> 18,401,678,776 99.996
> > >>>>
> > >>>> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execLoop()
> 11,599
> > >>>> 18,401,678,776 99.996
> > >>>>
> > >>>>
> > >>>> The three ProcedureExecutor threads also consume about 60% of all
> CPU
> > >>> time,
> > >>>> each about 20%. Here are the top hot code arcs from one of them:
> > >>>>
> > >>>> Stack Trace Sample Count Percentage(%)
> > >>>> org.apache.hadoop.hbase.procedure2.ProcedureFairRunQueues.poll()
> 1,521
> > >>>> 39.191
> > >>>>
>  org.apache.hadoop.hbase.master.procedure.MasterProcedureQueue.poll()
> > >>>> 1,521 39.191
> > >>>>      org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execLoop()
> > >>> 1,521
> > >>>> 39.191
> > >>>> java.lang.String$CaseInsensitiveComparator.compare(String, String)
> 614
> > >>>> 15.821
> > >>>>   java.lang.String$CaseInsensitiveComparator.compare(Object, Object)
> > >>> 614
> > >>>> 15.821
> > >>>>      java.lang.String.compareToIgnoreCase(String) 614 15.821
> > >>>>         org.apache.hadoop.hbase.ServerName.compareTo(ServerName)
614
> > >>>> 15.821
> > >>>>            org.apache.hadoop.hbase.ServerName.compareTo(Object)
614
> > >>> 15.821
> > >>> java.util.concurrent.ConcurrentSkipListMap.cpr(Comparator,
> > >>>> Object, Object) 614 15.821
> > >>>>
> > >>>> java.util.concurrent.ConcurrentSkipListMap.findNear(Object, int,
> > >>>> Comparator)
> > >>>> 614 15.821
> > >>>>
> > >>>> java.util.concurrent.ConcurrentSkipListMap.getNear(Object, int)
614
> > >>> 15.821
> > >>>>
> > >>>> java.util.concurrent.ConcurrentSkipListMap.higherEntry(Object)
614
> > >>> 15.821
> > >>>>
> > >>>>
> org.apache.hadoop.hbase.procedure2.ProcedureFairRunQueues.nextObject()
> > >>> 614
> > >>>> 15.821
> > >>>>
> > >>>> org.apache.hadoop.hbase.procedure2.ProcedureFairRunQueues.poll()
614
> > >>> 15.821
> > >>>>
> > >>>> org.apache.hadoop.hbase.master.procedure.MasterProcedureQueue.poll()
> > >>> 614
> > >>>> 15.821
> > >>>>
> > >>>> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execLoop()
614
> > >>> 15.821
> > >>>> java.util.concurrent.ConcurrentSkipListMap$Node.createSnapshot()
468
> > >>> 12.059
> > >>>>   java.util.concurrent.ConcurrentSkipListMap.getNear(Object, int)
> 468
> > >>>> 12.059
> > >>>>      java.util.concurrent.ConcurrentSkipListMap.higherEntry(Object)
> > 468
> > >>>> 12.059
> > >>>>
> > >>>>
> org.apache.hadoop.hbase.procedure2.ProcedureFairRunQueues.nextObject()
> > >>> 468
> > >>>> 12.059
> > >>>>
> > >>>> org.apache.hadoop.hbase.procedure2.ProcedureFairRunQueues.poll()
> > >>>> 468 12.059
> > >>>>
> > >>>> org.apache.hadoop.hbase.master.procedure.MasterProcedureQueue.poll()
> > >>> 468
> > >>>> 12.059
> > >>>>
> > >>>> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execLoop()
468
> > >>> 12.059
> > >>>> java.util.concurrent.ConcurrentSkipListMap.cpr(Comparator, Object,
> > >>> Object)
> > >>>> 402 10.358
> > >>>>   java.util.concurrent.ConcurrentSkipListMap.findNear(Object, int,
> > >>>> Comparator) 280 7.215
> > >>>>      java.util.concurrent.ConcurrentSkipListMap.getNear(Object,
int)
> > >>> 280
> > >>>> 7.215
> > >>>>
>  java.util.concurrent.ConcurrentSkipListMap.higherEntry(Object)
> > >>> 280
> > >>>> 7.215
> > >>>>
> > >>>>
> org.apache.hadoop.hbase.procedure2.ProcedureFairRunQueues.nextObject()
> > >>> 280
> > >>>> 7.215
> > >>>>
> > >>>> org.apache.hadoop.hbase.procedure2.ProcedureFairRunQueues.poll()
280
> > >>> 7.215
> > >>>>
> > >>>> org.apache.hadoop.hbase.master.procedure.MasterProcedureQueue.poll()
> > 280
> > >>>> 7.215
> > >>>>
> > >>>> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execLoop()
280
> > >>> 7.215
> > >>>>   java.util.concurrent.ConcurrentSkipListMap.findPredecessor(Object,
> > >>>> Comparator) 122 3.144
> > >>>>      java.util.concurrent.ConcurrentSkipListMap.findNear(Object,
> int,
> > >>>> Comparator) 122 3.144
> > >>>>         java.util.concurrent.ConcurrentSkipListMap.getNear(Object,
> > int)
> > >>>> 122
> > >>>> 3.144
> > >>> java.util.concurrent.ConcurrentSkipListMap.higherEntry(Object)
> > >>>> 122 3.144
> > >>>>
> > >>>>
> org.apache.hadoop.hbase.procedure2.ProcedureFairRunQueues.nextObject()
> > >>> 122
> > >>>> 3.144
> > >>>>
> > >>>> org.apache.hadoop.hbase.procedure2.ProcedureFairRunQueues.poll()
122
> > >>> 3.144
> > >>>>
> > >>>> org.apache.hadoop.hbase.master.procedure.MasterProcedureQueue.poll()
> > >>> 122
> > >>>> 3.144
> > >>>>
> > >>>> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execLoop()
122
> > >>> 3.144
> > >>>>
> > >>>>
> > >>>> ​Just me? Someone working on ProcedureV2 should have a look at
WTH
> the
> > >>>> ProcedureExecutors are doing. They seem to be polling the run queue
> > >>>> constantly and burning up CPU. I'll try running more workloads
at
> the
> > >>> next
> > >>>> opportunity and see if I can repeat this or gain more insight.
> > >>>>
> > >>>>
> > >>>> --
> > >>>> Best regards,
> > >>>>
> > >>>>   - Andy
> > >>>>
> > >>>> Problems worthy of attack prove their worth by hitting back. -
Piet
> > Hein
> > >>>> (via Tom White)
> > >>
> > >>
> >
>



-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message