Return-Path: X-Original-To: apmail-hbase-dev-archive@www.apache.org Delivered-To: apmail-hbase-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 54886174E6 for ; Fri, 6 Nov 2015 06:46:10 +0000 (UTC) Received: (qmail 52229 invoked by uid 500); 6 Nov 2015 06:46:09 -0000 Delivered-To: apmail-hbase-dev-archive@hbase.apache.org Received: (qmail 52140 invoked by uid 500); 6 Nov 2015 06:46:09 -0000 Mailing-List: contact dev-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hbase.apache.org Delivered-To: mailing list dev@hbase.apache.org Received: (qmail 52125 invoked by uid 99); 6 Nov 2015 06:46:09 -0000 Received: from mail-relay.apache.org (HELO mail-relay.apache.org) (140.211.11.15) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 06 Nov 2015 06:46:09 +0000 Received: from mail-lb0-f181.google.com (mail-lb0-f181.google.com [209.85.217.181]) by mail-relay.apache.org (ASF Mail Server at mail-relay.apache.org) with ESMTPSA id 18FFA1A0252 for ; Fri, 6 Nov 2015 06:46:08 +0000 (UTC) Received: by lbbkw15 with SMTP id kw15so45309415lbb.0 for ; Thu, 05 Nov 2015 22:46:07 -0800 (PST) X-Received: by 10.112.129.161 with SMTP id nx1mr5838950lbb.112.1446792367449; Thu, 05 Nov 2015 22:46:07 -0800 (PST) MIME-Version: 1.0 Received: by 10.25.214.31 with HTTP; Thu, 5 Nov 2015 22:45:24 -0800 (PST) In-Reply-To: References: <2E758341-DFC5-42B2-913F-E98FB6302091@gmail.com> From: Andrew Purtell Date: Thu, 5 Nov 2015 22:45:24 -0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: Has anyone tried profiling trunk recently? To: "dev@hbase.apache.org" Content-Type: multipart/alternative; boundary=047d7b3a8f602032440523d99990 --047d7b3a8f602032440523d99990 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable It was definitely something to do with SSH. I was able to reproduce, then ran through this again but manually cleared away all files in WALs/* after shutdown, and then the profile was what I'd expect. On Thu, Nov 5, 2015 at 8:41 PM, Andrew Purtell wrote: > > > On Thu, Nov 5, 2015 at 12:18 PM, Matteo Bertozzi > wrote: > >> what is the shutdown and restart? a clean stop? kill? >> > > =E2=80=8BThe shutdown of a ^C of the minicluster as launched out of the d= ev tree. > The termination looked normal to be but I did see the SSH handler kick in > upon restart.=E2=80=8B > > is the second YCSB job run on the cloned table? >> > > =E2=80=8BYes=E2=80=8B > > > >> >> the only master operation with procedures that I see here are >> - create meta & namespace table >> - create the table for ycsb >> - disable the table for snapshot >> - maybe enable for the table after restart? >> >> so, 5 procedure running let say 10 step each (exaggerating) you'll have = 50 >> calls to poll() >> then we have the SSH procedure, how many server does the cluster have? >> > > =E2=80=8BThe minicluster in master embeds a zk quorum peer, the master, a= nd two > regionservers.=E2=80=8B > > > >> maybe we keep spinning on the server actions after restart... >> >> > =E2=80=8BThat was a thought I had as well, spinning on poll of the master > procedure queue=E2=80=8B after restart. > > > >> >> >> Matteo >> >> >> On Thu, Nov 5, 2015 at 11:47 AM, Andrew Purtell > > >> wrote: >> >> > Here's what I did: >> > >> > 0. Launch minicluster out of dev tree with ./bin/hbase master start >> > 1. Load 10M rows using YCSB >> > 2. Snapshot >> > 3. Shutdown and restart >> > 4. Clone a test table from the snapshot >> > 5. Run YCSB workload A for 2M operations >> > >> > The profile was captured at step 5. >> > >> > HTH >> > >> > >> > > On Nov 5, 2015, at 10:40 AM, Matteo Bertozzi > > >> > wrote: >> > > >> > > what operation do you have going on the HMaster? >> > > I did a simple test where there are no operation going on and poll >> > doesn't >> > > get out of the waitCond.await() >> > > so unless you have operations on the master you'll never call >> > queue.poll() >> > > >> > > or maybe, there's a bug and the previous ops where hanging and >> trigger a >> > > loop. >> > > what did you executed on the master before? >> > > do you have a script or something that I can use to reproduce this? >> > > >> > > Matteo >> > > >> > > >> > > On Thu, Nov 5, 2015 at 10:25 AM, Matteo Bertozzi < >> > theo.bertozzi@gmail.com> >> > > wrote: >> > > >> > >> let me take a look, but branch-1 and master should be the same on t= he >> > proc >> > >> stuff >> > >> >> > >> Matteo >> > >> >> > >> >> > >>> On Thu, Nov 5, 2015 at 10:23 AM, Stack wrote: >> > >>> >> > >>> You are looking at master process only? >> > >>> >> > >>> I've been looking at branch-1 only of late. >> > >>> >> > >>> St.Ack >> > >>> >> > >>> On Thu, Nov 5, 2015 at 12:06 AM, Andrew Purtell < >> apurtell@apache.org> >> > >>> wrote: >> > >>> >> > >>>> I just profiled trunk running YCSB workload A. I was looking for >> > >>> something >> > >>>> else but found completely unexpected results. >> > >>>> >> > >>>> I see 60% of all allocation pressure in TLABs coming from >> > >>>> ProcedureExecutors, allocating 16 GB of >> > >>>> java.util.AbstractMap$SimpleImmutableEntry in this code: >> > >>>> >> > >>>> Stack Trace TLABs Total TLAB Size(bytes) Pressure(%) >> > >>>> java.util.concurrent.ConcurrentSkipListMap$Node.createSnapshot() >> > 11,603 >> > >>>> 18,402,380,120 100 >> > >>>> java.util.concurrent.ConcurrentSkipListMap.firstEntry() 11,599 >> > >>>> 18,401,678,776 99.996 >> > >>>> >> > >>>> >> org.apache.hadoop.hbase.procedure2.ProcedureFairRunQueues.nextObject() >> > >>>> 11,599 18,401,678,776 99.996 >> > >>> org.apache.hadoop.hbase.procedure2.ProcedureFairRunQueues.poll() >> > >>>> 11,599 18,401,678,776 99.996 >> > >>>> >> > >>>> >> org.apache.hadoop.hbase.master.procedure.MasterProcedureQueue.poll() >> > >>> 11,599 >> > >>>> 18,401,678,776 99.996 >> > >>>> >> > >>>> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execLoop() >> 11,599 >> > >>>> 18,401,678,776 99.996 >> > >>>> >> > >>>> >> > >>>> The three ProcedureExecutor threads also consume about 60% of all >> CPU >> > >>> time, >> > >>>> each about 20%. Here are the top hot code arcs from one of them: >> > >>>> >> > >>>> Stack Trace Sample Count Percentage(%) >> > >>>> org.apache.hadoop.hbase.procedure2.ProcedureFairRunQueues.poll() >> 1,521 >> > >>>> 39.191 >> > >>>> >> org.apache.hadoop.hbase.master.procedure.MasterProcedureQueue.poll() >> > >>>> 1,521 39.191 >> > >>>> >> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execLoop() >> > >>> 1,521 >> > >>>> 39.191 >> > >>>> java.lang.String$CaseInsensitiveComparator.compare(String, String= ) >> 614 >> > >>>> 15.821 >> > >>>> java.lang.String$CaseInsensitiveComparator.compare(Object, >> Object) >> > >>> 614 >> > >>>> 15.821 >> > >>>> java.lang.String.compareToIgnoreCase(String) 614 15.821 >> > >>>> org.apache.hadoop.hbase.ServerName.compareTo(ServerName) >> 614 >> > >>>> 15.821 >> > >>>> org.apache.hadoop.hbase.ServerName.compareTo(Object) 6= 14 >> > >>> 15.821 >> > >>> java.util.concurrent.ConcurrentSkipListMap.cpr(Comparator, >> > >>>> Object, Object) 614 15.821 >> > >>>> >> > >>>> java.util.concurrent.ConcurrentSkipListMap.findNear(Object, int, >> > >>>> Comparator) >> > >>>> 614 15.821 >> > >>>> >> > >>>> java.util.concurrent.ConcurrentSkipListMap.getNear(Object, int) 6= 14 >> > >>> 15.821 >> > >>>> >> > >>>> java.util.concurrent.ConcurrentSkipListMap.higherEntry(Object) 61= 4 >> > >>> 15.821 >> > >>>> >> > >>>> >> org.apache.hadoop.hbase.procedure2.ProcedureFairRunQueues.nextObject() >> > >>> 614 >> > >>>> 15.821 >> > >>>> >> > >>>> org.apache.hadoop.hbase.procedure2.ProcedureFairRunQueues.poll() >> 614 >> > >>> 15.821 >> > >>>> >> > >>>> >> org.apache.hadoop.hbase.master.procedure.MasterProcedureQueue.poll() >> > >>> 614 >> > >>>> 15.821 >> > >>>> >> > >>>> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execLoop() 6= 14 >> > >>> 15.821 >> > >>>> java.util.concurrent.ConcurrentSkipListMap$Node.createSnapshot() >> 468 >> > >>> 12.059 >> > >>>> java.util.concurrent.ConcurrentSkipListMap.getNear(Object, int) >> 468 >> > >>>> 12.059 >> > >>>> java.util.concurrent.ConcurrentSkipListMap.higherEntry(Objec= t) >> > 468 >> > >>>> 12.059 >> > >>>> >> > >>>> >> org.apache.hadoop.hbase.procedure2.ProcedureFairRunQueues.nextObject() >> > >>> 468 >> > >>>> 12.059 >> > >>>> >> > >>>> org.apache.hadoop.hbase.procedure2.ProcedureFairRunQueues.poll() >> > >>>> 468 12.059 >> > >>>> >> > >>>> >> org.apache.hadoop.hbase.master.procedure.MasterProcedureQueue.poll() >> > >>> 468 >> > >>>> 12.059 >> > >>>> >> > >>>> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execLoop() 4= 68 >> > >>> 12.059 >> > >>>> java.util.concurrent.ConcurrentSkipListMap.cpr(Comparator, Object= , >> > >>> Object) >> > >>>> 402 10.358 >> > >>>> java.util.concurrent.ConcurrentSkipListMap.findNear(Object, int= , >> > >>>> Comparator) 280 7.215 >> > >>>> java.util.concurrent.ConcurrentSkipListMap.getNear(Object, >> int) >> > >>> 280 >> > >>>> 7.215 >> > >>>> >> java.util.concurrent.ConcurrentSkipListMap.higherEntry(Object) >> > >>> 280 >> > >>>> 7.215 >> > >>>> >> > >>>> >> org.apache.hadoop.hbase.procedure2.ProcedureFairRunQueues.nextObject() >> > >>> 280 >> > >>>> 7.215 >> > >>>> >> > >>>> org.apache.hadoop.hbase.procedure2.ProcedureFairRunQueues.poll() >> 280 >> > >>> 7.215 >> > >>>> >> > >>>> >> org.apache.hadoop.hbase.master.procedure.MasterProcedureQueue.poll() >> > 280 >> > >>>> 7.215 >> > >>>> >> > >>>> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execLoop() 2= 80 >> > >>> 7.215 >> > >>>> >> java.util.concurrent.ConcurrentSkipListMap.findPredecessor(Object, >> > >>>> Comparator) 122 3.144 >> > >>>> java.util.concurrent.ConcurrentSkipListMap.findNear(Object, >> int, >> > >>>> Comparator) 122 3.144 >> > >>>> java.util.concurrent.ConcurrentSkipListMap.getNear(Object= , >> > int) >> > >>>> 122 >> > >>>> 3.144 >> > >>> java.util.concurrent.ConcurrentSkipListMap.higherEntry(Object) >> > >>>> 122 3.144 >> > >>>> >> > >>>> >> org.apache.hadoop.hbase.procedure2.ProcedureFairRunQueues.nextObject() >> > >>> 122 >> > >>>> 3.144 >> > >>>> >> > >>>> org.apache.hadoop.hbase.procedure2.ProcedureFairRunQueues.poll() >> 122 >> > >>> 3.144 >> > >>>> >> > >>>> >> org.apache.hadoop.hbase.master.procedure.MasterProcedureQueue.poll() >> > >>> 122 >> > >>>> 3.144 >> > >>>> >> > >>>> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execLoop() 1= 22 >> > >>> 3.144 >> > >>>> >> > >>>> >> > >>>> =E2=80=8BJust me? Someone working on ProcedureV2 should have a lo= ok at WTH >> the >> > >>>> ProcedureExecutors are doing. They seem to be polling the run que= ue >> > >>>> constantly and burning up CPU. I'll try running more workloads at >> the >> > >>> next >> > >>>> opportunity and see if I can repeat this or gain more insight. >> > >>>> >> > >>>> >> > >>>> -- >> > >>>> Best regards, >> > >>>> >> > >>>> - Andy >> > >>>> >> > >>>> Problems worthy of attack prove their worth by hitting back. - Pi= et >> > Hein >> > >>>> (via Tom White) >> > >> >> > >> >> > >> > > > > -- > Best regards, > > - Andy > > Problems worthy of attack prove their worth by hitting back. - Piet Hein > (via Tom White) > --=20 Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White) --047d7b3a8f602032440523d99990--