Mailing-List: contact dev-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@hbase.apache.org
MIME-Version: 1.0
In-Reply-To: 
 <CA+RK=_BGDrQfOqHEOBO7_NMJHLz-pXsS2y501So96vGv893rvA@mail.gmail.com>
References: 
 <CA+RK=_BMhb1_oHUt4x9zmOaK7xUpEo-J6MdWnyCzjAMRo1NgVQ@mail.gmail.com>
 <CADcMMgEi9DD81SUyH0qEAnBJvsn+8y3=cNRX3emyApbYYPMcng@mail.gmail.com>
 <CAFb6HiLRCAxzvCvm1XyEMXA2HVRdQ5=r7rHTZN38u3LQLhOYWQ@mail.gmail.com>
 <CAFb6HiK6TO1wcmR7CgJv2Vt_-6iz=xPOxRrgoSDXRuWb-+4jyA@mail.gmail.com>
 <2E758341-DFC5-42B2-913F-E98FB6302091@gmail.com>
 <CAFb6Hi+VZnJEr1mzMeZJVOnook-8fd=PAHgfQodYWJ97hbap=A@mail.gmail.com>
 <CA+RK=_BGDrQfOqHEOBO7_NMJHLz-pXsS2y501So96vGv893rvA@mail.gmail.com>
From: Andrew Purtell <apurtell@apache.org>
Date: Thu, 5 Nov 2015 22:45:24 -0800
Message-ID: 
 <CA+RK=_D7MYJ7zKdVSAE-ZiQ0DhtWhx1SM7QuJJXKrrCuLzFArg@mail.gmail.com>
Subject: Re: Has anyone tried profiling trunk recently?
To: "dev@hbase.apache.org" <dev@hbase.apache.org>
Content-Type: multipart/alternative; boundary=047d7b3a8f602032440523d99990

--047d7b3a8f602032440523d99990
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

It was definitely something to do with SSH. I was able to reproduce, then
ran through this again but manually cleared away all files in WALs/* after
shutdown, and then the profile was what I'd expect.


On Thu, Nov 5, 2015 at 8:41 PM, Andrew Purtell <apurtell@apache.org> wrote:

>
>
> On Thu, Nov 5, 2015 at 12:18 PM, Matteo Bertozzi <theo.bertozzi@gmail.com=
>
> wrote:
>
>> what is the shutdown and restart? a clean stop? kill?
>>
>
> =E2=80=8BThe shutdown of a ^C of the minicluster as launched out of the d=
ev tree.
> The termination looked normal to be but I did see the SSH handler kick in
> upon restart.=E2=80=8B
>
> is the second YCSB job run on the cloned table?
>>
>
> =E2=80=8BYes=E2=80=8B
>
>
>
>>
>> the only master operation with procedures that I see here are
>>  - create meta & namespace table
>>  - create the table for ycsb
>>  - disable the table for snapshot
>>  - maybe enable for the table after restart?
>>
>> so, 5 procedure running let say 10 step each (exaggerating) you'll have =
50
>> calls to poll()
>> then we have the SSH procedure, how many server does the cluster have?
>>
>
> =E2=80=8BThe minicluster in master embeds a zk quorum peer, the master, a=
nd two
> regionservers.=E2=80=8B
>
>
>
>> maybe we keep spinning on the server actions after restart...
>>
>>
> =E2=80=8BThat was a thought I had as well, spinning on poll of the master
> procedure queue=E2=80=8B after restart.
>
>
>
>>
>>
>> Matteo
>>
>>
>> On Thu, Nov 5, 2015 at 11:47 AM, Andrew Purtell <andrew.purtell@gmail.co=
m
>> >
>> wrote:
>>
>> > Here's what I did:
>> >
>> > 0. Launch minicluster out of dev tree with ./bin/hbase master start
>> > 1. Load 10M rows using YCSB
>> > 2. Snapshot
>> > 3. Shutdown and restart
>> > 4. Clone a test table from the snapshot
>> > 5. Run YCSB workload A for 2M operations
>> >
>> > The profile was captured at step 5.
>> >
>> > HTH
>> >
>> >
>> > > On Nov 5, 2015, at 10:40 AM, Matteo Bertozzi <theo.bertozzi@gmail.co=
m
>> >
>> > wrote:
>> > >
>> > > what operation do you have going on the HMaster?
>> > > I did a simple test where there are no operation going on and poll
>> > doesn't
>> > > get out of the waitCond.await()
>> > > so unless you have operations on the master you'll never call
>> > queue.poll()
>> > >
>> > > or maybe, there's a bug and the previous ops where hanging and
>> trigger a
>> > > loop.
>> > > what did you executed on the master before?
>> > > do you have a script or something that I can use to reproduce this?
>> > >
>> > > Matteo
>> > >
>> > >
>> > > On Thu, Nov 5, 2015 at 10:25 AM, Matteo Bertozzi <
>> > theo.bertozzi@gmail.com>
>> > > wrote:
>> > >
>> > >> let me take a look, but branch-1 and master should be the same on t=
he
>> > proc
>> > >> stuff
>> > >>
>> > >> Matteo
>> > >>
>> > >>
>> > >>> On Thu, Nov 5, 2015 at 10:23 AM, Stack <stack@duboce.net> wrote:
>> > >>>
>> > >>> You are looking at master process only?
>> > >>>
>> > >>> I've been looking at branch-1 only of late.
>> > >>>
>> > >>> St.Ack
>> > >>>
>> > >>> On Thu, Nov 5, 2015 at 12:06 AM, Andrew Purtell <
>> apurtell@apache.org>
>> > >>> wrote:
>> > >>>
>> > >>>> I just profiled trunk running YCSB workload A. I was looking for
>> > >>> something
>> > >>>> else but found completely unexpected results.
>> > >>>>
>> > >>>> I see 60% of all allocation pressure in TLABs coming from
>> > >>>> ProcedureExecutors, allocating 16 GB of
>> > >>>> java.util.AbstractMap$SimpleImmutableEntry in this code:
>> > >>>>
>> > >>>> Stack Trace TLABs Total TLAB Size(bytes) Pressure(%)
>> > >>>> java.util.concurrent.ConcurrentSkipListMap$Node.createSnapshot()
>> > 11,603
>> > >>>> 18,402,380,120 100
>> > >>>>   java.util.concurrent.ConcurrentSkipListMap.firstEntry() 11,599
>> > >>>> 18,401,678,776 99.996
>> > >>>>
>> > >>>>
>> org.apache.hadoop.hbase.procedure2.ProcedureFairRunQueues.nextObject()
>> > >>>> 11,599 18,401,678,776 99.996
>> > >>> org.apache.hadoop.hbase.procedure2.ProcedureFairRunQueues.poll()
>> > >>>> 11,599 18,401,678,776 99.996
>> > >>>>
>> > >>>>
>> org.apache.hadoop.hbase.master.procedure.MasterProcedureQueue.poll()
>> > >>> 11,599
>> > >>>> 18,401,678,776 99.996
>> > >>>>
>> > >>>> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execLoop()
>> 11,599
>> > >>>> 18,401,678,776 99.996
>> > >>>>
>> > >>>>
>> > >>>> The three ProcedureExecutor threads also consume about 60% of all
>> CPU
>> > >>> time,
>> > >>>> each about 20%. Here are the top hot code arcs from one of them:
>> > >>>>
>> > >>>> Stack Trace Sample Count Percentage(%)
>> > >>>> org.apache.hadoop.hbase.procedure2.ProcedureFairRunQueues.poll()
>> 1,521
>> > >>>> 39.191
>> > >>>>
>>  org.apache.hadoop.hbase.master.procedure.MasterProcedureQueue.poll()
>> > >>>> 1,521 39.191
>> > >>>>
>> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execLoop()
>> > >>> 1,521
>> > >>>> 39.191
>> > >>>> java.lang.String$CaseInsensitiveComparator.compare(String, String=
)
>> 614
>> > >>>> 15.821
>> > >>>>   java.lang.String$CaseInsensitiveComparator.compare(Object,
>> Object)
>> > >>> 614
>> > >>>> 15.821
>> > >>>>      java.lang.String.compareToIgnoreCase(String) 614 15.821
>> > >>>>         org.apache.hadoop.hbase.ServerName.compareTo(ServerName)
>> 614
>> > >>>> 15.821
>> > >>>>            org.apache.hadoop.hbase.ServerName.compareTo(Object) 6=
14
>> > >>> 15.821
>> > >>> java.util.concurrent.ConcurrentSkipListMap.cpr(Comparator,
>> > >>>> Object, Object) 614 15.821
>> > >>>>
>> > >>>> java.util.concurrent.ConcurrentSkipListMap.findNear(Object, int,
>> > >>>> Comparator)
>> > >>>> 614 15.821
>> > >>>>
>> > >>>> java.util.concurrent.ConcurrentSkipListMap.getNear(Object, int) 6=
14
>> > >>> 15.821
>> > >>>>
>> > >>>> java.util.concurrent.ConcurrentSkipListMap.higherEntry(Object) 61=
4
>> > >>> 15.821
>> > >>>>
>> > >>>>
>> org.apache.hadoop.hbase.procedure2.ProcedureFairRunQueues.nextObject()
>> > >>> 614
>> > >>>> 15.821
>> > >>>>
>> > >>>> org.apache.hadoop.hbase.procedure2.ProcedureFairRunQueues.poll()
>> 614
>> > >>> 15.821
>> > >>>>
>> > >>>>
>> org.apache.hadoop.hbase.master.procedure.MasterProcedureQueue.poll()
>> > >>> 614
>> > >>>> 15.821
>> > >>>>
>> > >>>> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execLoop() 6=
14
>> > >>> 15.821
>> > >>>> java.util.concurrent.ConcurrentSkipListMap$Node.createSnapshot()
>> 468
>> > >>> 12.059
>> > >>>>   java.util.concurrent.ConcurrentSkipListMap.getNear(Object, int)
>> 468
>> > >>>> 12.059
>> > >>>>      java.util.concurrent.ConcurrentSkipListMap.higherEntry(Objec=
t)
>> > 468
>> > >>>> 12.059
>> > >>>>
>> > >>>>
>> org.apache.hadoop.hbase.procedure2.ProcedureFairRunQueues.nextObject()
>> > >>> 468
>> > >>>> 12.059
>> > >>>>
>> > >>>> org.apache.hadoop.hbase.procedure2.ProcedureFairRunQueues.poll()
>> > >>>> 468 12.059
>> > >>>>
>> > >>>>
>> org.apache.hadoop.hbase.master.procedure.MasterProcedureQueue.poll()
>> > >>> 468
>> > >>>> 12.059
>> > >>>>
>> > >>>> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execLoop() 4=
68
>> > >>> 12.059
>> > >>>> java.util.concurrent.ConcurrentSkipListMap.cpr(Comparator, Object=
,
>> > >>> Object)
>> > >>>> 402 10.358
>> > >>>>   java.util.concurrent.ConcurrentSkipListMap.findNear(Object, int=
,
>> > >>>> Comparator) 280 7.215
>> > >>>>      java.util.concurrent.ConcurrentSkipListMap.getNear(Object,
>> int)
>> > >>> 280
>> > >>>> 7.215
>> > >>>>
>>  java.util.concurrent.ConcurrentSkipListMap.higherEntry(Object)
>> > >>> 280
>> > >>>> 7.215
>> > >>>>
>> > >>>>
>> org.apache.hadoop.hbase.procedure2.ProcedureFairRunQueues.nextObject()
>> > >>> 280
>> > >>>> 7.215
>> > >>>>
>> > >>>> org.apache.hadoop.hbase.procedure2.ProcedureFairRunQueues.poll()
>> 280
>> > >>> 7.215
>> > >>>>
>> > >>>>
>> org.apache.hadoop.hbase.master.procedure.MasterProcedureQueue.poll()
>> > 280
>> > >>>> 7.215
>> > >>>>
>> > >>>> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execLoop() 2=
80
>> > >>> 7.215
>> > >>>>
>>  java.util.concurrent.ConcurrentSkipListMap.findPredecessor(Object,
>> > >>>> Comparator) 122 3.144
>> > >>>>      java.util.concurrent.ConcurrentSkipListMap.findNear(Object,
>> int,
>> > >>>> Comparator) 122 3.144
>> > >>>>         java.util.concurrent.ConcurrentSkipListMap.getNear(Object=
,
>> > int)
>> > >>>> 122
>> > >>>> 3.144
>> > >>> java.util.concurrent.ConcurrentSkipListMap.higherEntry(Object)
>> > >>>> 122 3.144
>> > >>>>
>> > >>>>
>> org.apache.hadoop.hbase.procedure2.ProcedureFairRunQueues.nextObject()
>> > >>> 122
>> > >>>> 3.144
>> > >>>>
>> > >>>> org.apache.hadoop.hbase.procedure2.ProcedureFairRunQueues.poll()
>> 122
>> > >>> 3.144
>> > >>>>
>> > >>>>
>> org.apache.hadoop.hbase.master.procedure.MasterProcedureQueue.poll()
>> > >>> 122
>> > >>>> 3.144
>> > >>>>
>> > >>>> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execLoop() 1=
22
>> > >>> 3.144
>> > >>>>
>> > >>>>
>> > >>>> =E2=80=8BJust me? Someone working on ProcedureV2 should have a lo=
ok at WTH
>> the
>> > >>>> ProcedureExecutors are doing. They seem to be polling the run que=
ue
>> > >>>> constantly and burning up CPU. I'll try running more workloads at
>> the
>> > >>> next
>> > >>>> opportunity and see if I can repeat this or gain more insight.
>> > >>>>
>> > >>>>
>> > >>>> --
>> > >>>> Best regards,
>> > >>>>
>> > >>>>   - Andy
>> > >>>>
>> > >>>> Problems worthy of attack prove their worth by hitting back. - Pi=
et
>> > Hein
>> > >>>> (via Tom White)
>> > >>
>> > >>
>> >
>>
>
>
>
> --
> Best regards,
>
>    - Andy
>
> Problems worthy of attack prove their worth by hitting back. - Piet Hein
> (via Tom White)
>


--=20
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)

--047d7b3a8f602032440523d99990--