Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 9F6BC200B71 for ; Wed, 31 Aug 2016 19:11:09 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 9DB13160AB5; Wed, 31 Aug 2016 17:11:09 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id F3C91160AA7 for ; Wed, 31 Aug 2016 19:11:07 +0200 (CEST) Received: (qmail 94060 invoked by uid 500); 31 Aug 2016 17:11:07 -0000 Mailing-List: contact user-help@accumulo.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@accumulo.apache.org Delivered-To: mailing list user@accumulo.apache.org Received: (qmail 94045 invoked by uid 99); 31 Aug 2016 17:11:07 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 31 Aug 2016 17:11:07 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 8DABDC031E for ; Wed, 31 Aug 2016 17:11:06 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.199 X-Spam-Level: * X-Spam-Status: No, score=1.199 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H2=-0.001] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (1024-bit key) header.d=teralytics.ch Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id ZAVkKBbRp_xA for ; Wed, 31 Aug 2016 17:11:02 +0000 (UTC) Received: from mail-ua0-f171.google.com (mail-ua0-f171.google.com [209.85.217.171]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id 9DABF5FE3D for ; Wed, 31 Aug 2016 17:11:01 +0000 (UTC) Received: by mail-ua0-f171.google.com with SMTP id q42so41696429uaq.1 for ; Wed, 31 Aug 2016 10:11:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=teralytics.ch; s=google; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=KimmW7bM+5Qz01Ot4mDeMKsEPr5ePliaTqEaGlkGmQ0=; b=Nz5jwALTG824mDWjewVTwe7VCZ8DGLSZaUB3IwSkF4z+YysuuAajUCQa4SXajLL3+H 8WLA/4TyKMGKuVuEygJuwm+AbJb2wU+C1uwo9K5oIP0OFYbsvRj8tRkQzS4klcTrVQTL iLks6hx345fR+GeLlRP3y/lVJ48iK0n6QpAeQ= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=KimmW7bM+5Qz01Ot4mDeMKsEPr5ePliaTqEaGlkGmQ0=; b=gpRaey6JNcVeofWo4+YuJNu5sI0hsLq80lODNzks2CAbfZG61HBYPooUB79Txr0MEF +91rhoF8P79pPF5XoiKkzKB31LV0ciak5uU/CKN736TYKLMJHmWbjXEn5MKLVD7slR+9 PaGhxxDiQBcJ4Gcz5Uf5uLnEW22SNyqbbrd35MGry5wsiLaUxRuA6cT8mhpOx4MTGDfx Ps5GbQ3ugirfTRr5ZtTVB4lAZXM+qrZ/9qQ0mBMR3sXF6wHCZ/gUY99JM3uV49z2g1yT j6yLnn67zbEwwDfIXTYHXiaTLG49pnNmFdVxLk3Eww8+zZjbUVkPb+jf9m+yw5cLU3aB UULQ== X-Gm-Message-State: AE9vXwNFkwNUtREiG3fjb1R4iJFEV2RimTfpM8TSrgaZBg3uIKsZ8/em+BVikxGdX0oKG+rVoCfGwBo4yLi6VA== X-Received: by 10.159.35.85 with SMTP id 79mr5017527uae.124.1472663460324; Wed, 31 Aug 2016 10:11:00 -0700 (PDT) MIME-Version: 1.0 Received: by 10.103.50.138 with HTTP; Wed, 31 Aug 2016 10:10:59 -0700 (PDT) In-Reply-To: <57C44F69.8050506@gmail.com> References: <57C3463A.8000607@gmail.com> <57C44F69.8050506@gmail.com> From: Mario Pastorelli Date: Wed, 31 Aug 2016 19:10:59 +0200 Message-ID: Subject: Re: Profile a (batch) scan To: user@accumulo.apache.org Content-Type: multipart/alternative; boundary=001a113ad4986d431e053b612e26 archived-at: Wed, 31 Aug 2016 17:11:09 -0000 --001a113ad4986d431e053b612e26 Content-Type: text/plain; charset=UTF-8 Below you can find a good example of what I'm struggling to understand right now. It's a trace for a simple scan over some columns with a BatchScanner using 75 threads. The scan takes 877 milliseconds and the main contributor is the entry "tablet read ahead 1", which starts at 248 ms. These are the questions that I cannot answer with this trace: 1. why this heavy operation starts after 248ms? By summing up the delay before this operation you get a number which is not even close to 248ms. 2. what does "tablet read ahead 1" means? In general, how to map the entries of a trace to their meaning? Is there a guide about this? 3. why "tablet read ahead 1" takes 600ms? It's clearly not the sum of the entries under this one but that's the important part. 4. I may be naive but...how much data have been read by this scan? How many entries? That's very important to understand what's going on. Thanks for the help, Mario 877+ 0 Dice@h01 counts 2+ 7 tserver@h12 startScan 6+ 10 tserver@h15 startScan 5+ 11 tserver@h15 metadata tablets read ahead 4 843+ 34 Dice@h01 batch scanner 74- 1 620+ 230 tserver@h09 startMultiScan 600+ 248 tserver@h09 tablet read ahead 1 22+ 299 tserver@h09 newDFSInputStream 22+ 299 tserver@h09 getBlockLocations 2+ 310 tserver@h09 ClientNamenodeProtocol#getBlockLocations 1+ 321 tserver@h09 getFileInfo 1+ 321 tserver@h09 ClientNamenodeProtocol#getFileInfo 2+ 322 tserver@h09 DFSInputStream#byteArrayRead 1+ 324 tserver@h09 DFSInputStream#byteArrayRead 2+ 831 tserver@h09 DFSInputStream#byteArrayRead 2+ 834 tserver@h09 DFSInputStream#byteArrayRead 1+ 835 tserver@h09 BlockReaderLocal#fillBuffer(1091850413) 1+ 874 tserver@h09 closeMultiScan On Mon, Aug 29, 2016 at 5:06 PM, Josh Elser wrote: > Right, no, I understand. > > I just meant that the metrics we do expose are lacking. It would be a huge > benefit to everyone if we could find more things that we can expose, do > that once, and then prevent the need for the next person to hand-roll some > things like you are doing now :) > > Mario Pastorelli wrote: > >> I think suggestions in this mailing list are useful, Josh, that's why I >> keep asking questions. I'm sorry that I'm asking so many questions but >> I'm trying to improve my knowledge of Accumulo and documentation is >> limited. Ideally, I would like to use only the metrics provided by >> Accumulo, because that's less stuff that I have to maintain. The >> StopWatch writing to the tracer could help. >> >> >> >> On Sun, Aug 28, 2016 at 10:14 PM, Josh Elser > > wrote: >> >> I know it's not a super-helpful response, but I would love to help >> you work through things we *can* expose and help you do that. >> >> I imagine there is significantly more that we can add into the >> dist-tracing information for BatchScanners now which would give more >> insight into the tserver (amount of data read, number of ranges per >> scan RPC, amount of data returned). This would be ideal as it would >> prevent you from having to update your application code (although, >> the suggestion of writing some iterator for timing purposes is a >> simple way to move forward) >> >> Mario Pastorelli wrote: >> >> I would like to understand the performance of a batch scan and I >> would >> like to have some hints on how to proceed. I have enabled the >> distributed trace, and it tells me that some batch scanner >> threads take >> much more time than others to complete but this is not helpful >> enough >> because it's not telling me why some threads take more. My gut >> feeling >> is that one batch thread is scanning more data than the others, >> which >> means that the data is not well distributed for a query, but I >> use a >> random shard byte as prefix of the keys which should guarantee >> that data >> of the same range is almost equally distributed among the >> tservers. I >> enabled JMX on the tservers and attached jvisualvm to get an >> idea of the >> state of each tserver but I couldn't find anything meaningful. I >> would >> like to know if there is a way to profile what's going on on a >> single >> tserver for a single scan thread and by this I mean: >> >> 1. where are the tablets required by a scan? Which tablet >> server? >> 2. how fast was the lookups on the index for that scan? >> 3. how many bytes/records were read for that scan without the >> iterators >> 4. how many seeks are done by the scan and possibly why >> >> The main Accumulo UI is fine to get an overview of Accumulo but >> don't >> really give you any information about the performance of a >> single query >> and it seems to me that they are heavily affected by what >> iterators do. >> Profiling a single scan is much more interesting. Is there a way >> to >> profile a single (batch) scan in Accumulo such that I have a >> complete >> overview of the entire process of reading and sending back >> records to >> the driver? >> >> Thanks, >> Mario >> >> -- >> Mario Pastorelli| TERALYTICS >> >> *software engineer* >> >> Teralytics AG | Zollstrasse 62 | 8005 Zurich | Switzerland >> phone:+41794381682 >> email: mario.pastorelli@teralytics.ch >> >> > >> > >> www.teralytics.net >> >> >> Company registration number: CH-020.3.037.709-7 | Trade register >> Canton >> Zurich >> Board of directors: Georg Polzer, Luciano Franceschina, Mark >> Schmitz, >> Yann de Vries >> >> This e-mail message contains confidential information which is >> for the >> sole attention and use of the intended recipient. Please notify >> us at >> once if you think that it may not be intended for you and delete >> it >> immediately. >> >> >> >> >> -- >> Mario Pastorelli| TERALYTICS >> >> *software engineer* >> >> Teralytics AG | Zollstrasse 62 | 8005 Zurich | Switzerland >> phone:+41794381682 >> email: mario.pastorelli@teralytics.ch >> >> www.teralytics.net >> >> Company registration number: CH-020.3.037.709-7 | Trade register Canton >> Zurich >> Board of directors: Georg Polzer, Luciano Franceschina, Mark Schmitz, >> Yann de Vries >> >> This e-mail message contains confidential information which is for the >> sole attention and use of the intended recipient. Please notify us at >> once if you think that it may not be intended for you and delete it >> immediately. >> >> -- Mario Pastorelli | TERALYTICS *software engineer* Teralytics AG | Zollstrasse 62 | 8005 Zurich | Switzerland phone: +41794381682 email: mario.pastorelli@teralytics.ch www.teralytics.net Company registration number: CH-020.3.037.709-7 | Trade register Canton Zurich Board of directors: Georg Polzer, Luciano Franceschina, Mark Schmitz, Yann de Vries This e-mail message contains confidential information which is for the sole attention and use of the intended recipient. Please notify us at once if you think that it may not be intended for you and delete it immediately. --001a113ad4986d431e053b612e26 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Below you can find a good example of what I'm struggli= ng to understand right now. It's a trace for a simple scan over some co= lumns with a BatchScanner using 75 threads. The scan takes 877 milliseconds= and the main contributor is the entry "tablet read ahead 1", whi= ch starts at 248 ms. These are the questions that I cannot answer with this= trace:
  1. why this heavy operation starts after 248ms? By summing = up the delay before this operation you get a number which is not even close= to 248ms.
  2. what does "tablet read ahead 1" means? In gene= ral, how to map the entries of a trace to their meaning? Is there a guide a= bout this?
  3. why "tablet read ahead 1" takes 600ms? It'= s clearly not the sum of the entries under this one but that's the impo= rtant part.
  4. I may be naive but...how much data have been read by th= is scan? How many entries? That's very important to understand what'= ;s going on.

Thanks for the help,

Mario

=
tserver@h09= = = =
8= 77+0Dice@h01c= ounts
2+7tserver@h12startScan
6+10tserver@= h15startScan
5+11tserver@h15metadata tablets read ahead 4
8= 43+34Dice@h01batch scanner 74- 1
620+230startMultiScan
600+248tserver@h09tablet read ahead 1
22+299tserver@h09new= DFSInputStream
22+299tserver@h09getBlockLocations
2+310t= server@h09ClientNameno= deProtocol#getBlockLocations
1+321tserver= @h09getFileInfo
1+<= /td>321tserver@h09ClientNamenodeProtocol#getFileInfo
2+322DFSInputS= tream#byteArrayRead
1+324tserver@h09DFSInputStream#byteArrayRead
2+831tserver@h09= DFSInputStream#byteArrayRead
2+834tserver@h09DFSInputStream#byteArrayRead
1+835tserver@h09BlockReaderLocal#fillBuffer(1091850413)
1+= 874tserver@h09closeMultiScan


On Mon, Aug 29, 2016 at 5:0= 6 PM, Josh Elser <josh.elser@gmail.com> wrote:
Right, no, I understand.

I just meant that the metrics we do expose are lacking. It would be a huge = benefit to everyone if we could find more things that we can expose, do tha= t once, and then prevent the need for the next person to hand-roll some thi= ngs like you are doing now :)

Mario Pastorelli wrote:
I think suggestions in this mailing list are useful, Josh, that's why I=
keep asking questions. I'm sorry that I'm asking so many questions = but
I'm trying to improve my knowledge of Accumulo and documentation is
limited. Ideally, I would like to use only the metrics provided by
Accumulo, because that's less stuff that I have to maintain. The
StopWatch writing to the tracer could help.



On Sun, Aug 28, 2016 at 10:14 PM, Josh Elser <josh.elser@gmail.com
<mailto:josh.e= lser@gmail.com>> wrote:

=C2=A0 =C2=A0 I know it's not a super-helpful response, but I would lov= e to help
=C2=A0 =C2=A0 you work through things we *can* expose and help you do that.=

=C2=A0 =C2=A0 I imagine there is significantly more that we can add into th= e
=C2=A0 =C2=A0 dist-tracing information for BatchScanners now which would gi= ve more
=C2=A0 =C2=A0 insight into the tserver (amount of data read, number of rang= es per
=C2=A0 =C2=A0 scan RPC, amount of data returned). This would be ideal as it= would
=C2=A0 =C2=A0 prevent you from having to update your application code (alth= ough,
=C2=A0 =C2=A0 the suggestion of writing some iterator for timing purposes i= s a
=C2=A0 =C2=A0 simple way to move forward)

=C2=A0 =C2=A0 Mario Pastorelli wrote:

=C2=A0 =C2=A0 =C2=A0 =C2=A0 I would like to understand the performance of a= batch scan and I
=C2=A0 =C2=A0 =C2=A0 =C2=A0 would
=C2=A0 =C2=A0 =C2=A0 =C2=A0 like to have some hints on how to proceed. I ha= ve enabled the
=C2=A0 =C2=A0 =C2=A0 =C2=A0 distributed trace, and it tells me that some ba= tch scanner
=C2=A0 =C2=A0 =C2=A0 =C2=A0 threads take
=C2=A0 =C2=A0 =C2=A0 =C2=A0 much more time than others to complete but this= is not helpful
=C2=A0 =C2=A0 =C2=A0 =C2=A0 enough
=C2=A0 =C2=A0 =C2=A0 =C2=A0 because it's not telling me why some thread= s take more. My gut
=C2=A0 =C2=A0 =C2=A0 =C2=A0 feeling
=C2=A0 =C2=A0 =C2=A0 =C2=A0 is that one batch thread is scanning more data = than the others,
=C2=A0 =C2=A0 =C2=A0 =C2=A0 which
=C2=A0 =C2=A0 =C2=A0 =C2=A0 means that the data is not well distributed for= a query, but I use a
=C2=A0 =C2=A0 =C2=A0 =C2=A0 random shard byte as prefix of the keys which s= hould guarantee
=C2=A0 =C2=A0 =C2=A0 =C2=A0 that data
=C2=A0 =C2=A0 =C2=A0 =C2=A0 of the same range is almost equally distributed= among the
=C2=A0 =C2=A0 =C2=A0 =C2=A0 tservers. I
=C2=A0 =C2=A0 =C2=A0 =C2=A0 enabled JMX on the tservers and attached jvisua= lvm to get an
=C2=A0 =C2=A0 =C2=A0 =C2=A0 idea of the
=C2=A0 =C2=A0 =C2=A0 =C2=A0 state of each tserver but I couldn't find a= nything meaningful. I
=C2=A0 =C2=A0 =C2=A0 =C2=A0 would
=C2=A0 =C2=A0 =C2=A0 =C2=A0 like to know if there is a way to profile what&= #39;s going on on a
=C2=A0 =C2=A0 =C2=A0 =C2=A0 single
=C2=A0 =C2=A0 =C2=A0 =C2=A0 tserver for a single scan thread and by this I = mean:

=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 1. where are the tablets required by a s= can? Which tablet server?
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 2. how fast was the lookups on the index= for that scan?
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 3. how many bytes/records were read for = that scan without the
=C2=A0 =C2=A0 =C2=A0 =C2=A0 iterators
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 4. how many seeks are done by the scan a= nd possibly why

=C2=A0 =C2=A0 =C2=A0 =C2=A0 The main Accumulo UI is fine to get an overview= of Accumulo but
=C2=A0 =C2=A0 =C2=A0 =C2=A0 don't
=C2=A0 =C2=A0 =C2=A0 =C2=A0 really give you any information about the perfo= rmance of a
=C2=A0 =C2=A0 =C2=A0 =C2=A0 single query
=C2=A0 =C2=A0 =C2=A0 =C2=A0 and it seems to me that they are heavily affect= ed by what
=C2=A0 =C2=A0 =C2=A0 =C2=A0 iterators do.
=C2=A0 =C2=A0 =C2=A0 =C2=A0 Profiling a single scan is much more interestin= g. Is there a way to
=C2=A0 =C2=A0 =C2=A0 =C2=A0 profile a single (batch) scan in Accumulo such = that I have a
=C2=A0 =C2=A0 =C2=A0 =C2=A0 complete
=C2=A0 =C2=A0 =C2=A0 =C2=A0 overview of the entire process of reading and s= ending back
=C2=A0 =C2=A0 =C2=A0 =C2=A0 records to
=C2=A0 =C2=A0 =C2=A0 =C2=A0 the driver?

=C2=A0 =C2=A0 =C2=A0 =C2=A0 Thanks,
=C2=A0 =C2=A0 =C2=A0 =C2=A0 Mario

=C2=A0 =C2=A0 =C2=A0 =C2=A0 --
=C2=A0 =C2=A0 =C2=A0 =C2=A0 Mario Pastorelli| TERALYTICS

=C2=A0 =C2=A0 =C2=A0 =C2=A0 *software engineer*

=C2=A0 =C2=A0 =C2=A0 =C2=A0 Teralytics AG | Zollstrasse 62 | 8005 Zurich | = Switzerland
=C2=A0 =C2=A0 =C2=A0 =C2=A0 phone:+41794381682 <tel:%2B41794381682><= br> =C2=A0 =C2=A0 =C2=A0 =C2=A0 email: mario.pastorelli@teralytics.ch
=C2=A0 =C2=A0 =C2=A0 =C2=A0 <mailto:mario.pastorelli@teralytics.ch>=
=C2=A0 =C2=A0 =C2=A0 =C2=A0 <mailto:mario.pastorelli@teralytics.ch

=C2=A0 =C2=A0 =C2=A0 =C2=A0 <mailto:mario.pastorelli@teralytics.ch>= >
=C2=A0 =C2=A0 =C2=A0 =C2=A0 www.teralytics.net <http://www.teralytic= s.net>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 <http://www.teralytics.net/>

=C2=A0 =C2=A0 =C2=A0 =C2=A0 Company registration number: CH-020.3.037.709-7= | Trade register
=C2=A0 =C2=A0 =C2=A0 =C2=A0 Canton
=C2=A0 =C2=A0 =C2=A0 =C2=A0 Zurich
=C2=A0 =C2=A0 =C2=A0 =C2=A0 Board of directors: Georg Polzer, Luciano Franc= eschina, Mark
=C2=A0 =C2=A0 =C2=A0 =C2=A0 Schmitz,
=C2=A0 =C2=A0 =C2=A0 =C2=A0 Yann de Vries

=C2=A0 =C2=A0 =C2=A0 =C2=A0 This e-mail message contains confidential infor= mation which is
=C2=A0 =C2=A0 =C2=A0 =C2=A0 for the
=C2=A0 =C2=A0 =C2=A0 =C2=A0 sole attention and use of the intended recipien= t. Please notify
=C2=A0 =C2=A0 =C2=A0 =C2=A0 us at
=C2=A0 =C2=A0 =C2=A0 =C2=A0 once if you think that it may not be intended f= or you and delete it
=C2=A0 =C2=A0 =C2=A0 =C2=A0 immediately.




--
Mario Pastorelli| TERALYTICS

*software engineer*

Teralytics AG | Zollstrasse 62 | 8005 Zurich | Switzerland
phone:+41794381682
email: = mario.pastorelli@teralytics.ch
<mailto:mario.pastorelli@teralytics.ch>
= www.teralytics.net <http://www.teralytics.net/>

Company registration number: CH-020.3.037.709-7 | Trade register Canton
Zurich
Board of directors: Georg Polzer, Luciano Franceschina, Mark Schmitz,
Yann de Vries

This e-mail message contains confidential information which is for the
sole attention and use of the intended recipient. Please notify us at
once if you think that it may not be intended for you and delete it
immediately.




--
Ma= rio Pastorelli | TERALYTICS

software engineer<= /span>

Ter= alytics AG |=C2=A0Zollstrasse 62 | 8005 Zurich=C2=A0| Switzerland=C2=A0
phone: +41794381682
email: mario.pastorelli@teralytics.ch

www.teralytics.net

Company registration number: CH-020.3= .037.709-7 | Trade register Canton Zurich
Board of directors: Georg Polzer, Luc= iano Franceschina, Mark Schmitz, Yann de Vries

This e-mail message contains confidential i= nformation which is for the sole attention and use of the intended recipient. Please notify us at once = if you think that it may not be intended for you and delete it immediately.

--001a113ad4986d431e053b612e26--