Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 2DFA1200B73 for ; Mon, 29 Aug 2016 11:18:14 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 2C71A160AB8; Mon, 29 Aug 2016 09:18:14 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 2430C160AA7 for ; Mon, 29 Aug 2016 11:18:12 +0200 (CEST) Received: (qmail 10664 invoked by uid 500); 29 Aug 2016 09:18:12 -0000 Mailing-List: contact user-help@accumulo.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@accumulo.apache.org Delivered-To: mailing list user@accumulo.apache.org Received: (qmail 10654 invoked by uid 99); 29 Aug 2016 09:18:12 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 29 Aug 2016 09:18:12 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id CAF2D1A048B for ; Mon, 29 Aug 2016 09:18:11 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.18 X-Spam-Level: * X-Spam-Status: No, score=1.18 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (1024-bit key) header.d=teralytics.ch Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id gfqlmfg3vWXM for ; Mon, 29 Aug 2016 09:18:09 +0000 (UTC) Received: from mail-ua0-f174.google.com (mail-ua0-f174.google.com [209.85.217.174]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id C67655FBB5 for ; Mon, 29 Aug 2016 09:18:08 +0000 (UTC) Received: by mail-ua0-f174.google.com with SMTP id m60so195079622uam.3 for ; Mon, 29 Aug 2016 02:18:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=teralytics.ch; s=google; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=cEaoDIhgXLobgQG6C50tFAliTPNy8vDDkrqgmIo25hA=; b=aNOYHtbWBbzYZKl/OobQmSup2GbWuFwGthaOAcvP9pAFJHO5B0aeMmlfS2WPpetp9N UyV+F+a4Ayu9JcO7LuZIWU5ENcoF4PnnZ7Mm7S/hFTZNGBSNbOKjGRZ+cHs86FYMGWPo bWGECypjAbAXJVM3ptSwxo6YoHF0sjEUzx8MU= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=cEaoDIhgXLobgQG6C50tFAliTPNy8vDDkrqgmIo25hA=; b=jsOSbAiPXGkw+S+kkDMQDqbfoh75F4vOWwfNYWyL0LzkmAUj8NDuPoZV7QL6MWIkw0 t82rO90CRgFHdD/XcvFjlMvjxpPtcBTsf1GtEmfzgxIxOLMOrsHnLg16K1udkUUWamnF DHpM2rxDrmYqSUWWeHepXnYD23WOvE+e7LF8j52den16JJEj85MpoDi2gV/XP6rxx3U4 mrBQqxHznfbj1RXThGckVxyV/QUWbp4VdGT1gAS7qWQqGAcbKiI99t4v+niVAKupJE2u x+PnHNbQPS9q04TZNN/8Fz/uzB3Gf3qizuyV7j/OCfxtqaaFqyMk9BQvfPppb07etIOv IS6w== X-Gm-Message-State: AE9vXwPX8C4sJ3ox0nNh+G2+X28RBw2UYta6fqlkZ8Vo0WeJBl6n23bkDAM3gC1CiPh1ZnZp6TIWE7qSMOQQZw== X-Received: by 10.31.72.6 with SMTP id v6mr7707829vka.106.1472462281524; Mon, 29 Aug 2016 02:18:01 -0700 (PDT) MIME-Version: 1.0 Received: by 10.103.50.138 with HTTP; Mon, 29 Aug 2016 02:18:00 -0700 (PDT) In-Reply-To: <57C3463A.8000607@gmail.com> References: <57C3463A.8000607@gmail.com> From: Mario Pastorelli Date: Mon, 29 Aug 2016 11:18:00 +0200 Message-ID: Subject: Re: Profile a (batch) scan To: user@accumulo.apache.org Content-Type: multipart/alternative; boundary=001a114dc7e83c5c1e053b325755 archived-at: Mon, 29 Aug 2016 09:18:14 -0000 --001a114dc7e83c5c1e053b325755 Content-Type: text/plain; charset=UTF-8 I think suggestions in this mailing list are useful, Josh, that's why I keep asking questions. I'm sorry that I'm asking so many questions but I'm trying to improve my knowledge of Accumulo and documentation is limited. Ideally, I would like to use only the metrics provided by Accumulo, because that's less stuff that I have to maintain. The StopWatch writing to the tracer could help. On Sun, Aug 28, 2016 at 10:14 PM, Josh Elser wrote: > I know it's not a super-helpful response, but I would love to help you > work through things we *can* expose and help you do that. > > I imagine there is significantly more that we can add into the > dist-tracing information for BatchScanners now which would give more > insight into the tserver (amount of data read, number of ranges per scan > RPC, amount of data returned). This would be ideal as it would prevent you > from having to update your application code (although, the suggestion of > writing some iterator for timing purposes is a simple way to move forward) > > Mario Pastorelli wrote: > >> I would like to understand the performance of a batch scan and I would >> like to have some hints on how to proceed. I have enabled the >> distributed trace, and it tells me that some batch scanner threads take >> much more time than others to complete but this is not helpful enough >> because it's not telling me why some threads take more. My gut feeling >> is that one batch thread is scanning more data than the others, which >> means that the data is not well distributed for a query, but I use a >> random shard byte as prefix of the keys which should guarantee that data >> of the same range is almost equally distributed among the tservers. I >> enabled JMX on the tservers and attached jvisualvm to get an idea of the >> state of each tserver but I couldn't find anything meaningful. I would >> like to know if there is a way to profile what's going on on a single >> tserver for a single scan thread and by this I mean: >> >> 1. where are the tablets required by a scan? Which tablet server? >> 2. how fast was the lookups on the index for that scan? >> 3. how many bytes/records were read for that scan without the iterators >> 4. how many seeks are done by the scan and possibly why >> >> The main Accumulo UI is fine to get an overview of Accumulo but don't >> really give you any information about the performance of a single query >> and it seems to me that they are heavily affected by what iterators do. >> Profiling a single scan is much more interesting. Is there a way to >> profile a single (batch) scan in Accumulo such that I have a complete >> overview of the entire process of reading and sending back records to >> the driver? >> >> Thanks, >> Mario >> >> -- >> Mario Pastorelli| TERALYTICS >> >> *software engineer* >> >> Teralytics AG | Zollstrasse 62 | 8005 Zurich | Switzerland >> phone:+41794381682 >> email: mario.pastorelli@teralytics.ch >> >> www.teralytics.net >> >> Company registration number: CH-020.3.037.709-7 | Trade register Canton >> Zurich >> Board of directors: Georg Polzer, Luciano Franceschina, Mark Schmitz, >> Yann de Vries >> >> This e-mail message contains confidential information which is for the >> sole attention and use of the intended recipient. Please notify us at >> once if you think that it may not be intended for you and delete it >> immediately. >> >> -- Mario Pastorelli | TERALYTICS *software engineer* Teralytics AG | Zollstrasse 62 | 8005 Zurich | Switzerland phone: +41794381682 email: mario.pastorelli@teralytics.ch www.teralytics.net Company registration number: CH-020.3.037.709-7 | Trade register Canton Zurich Board of directors: Georg Polzer, Luciano Franceschina, Mark Schmitz, Yann de Vries This e-mail message contains confidential information which is for the sole attention and use of the intended recipient. Please notify us at once if you think that it may not be intended for you and delete it immediately. --001a114dc7e83c5c1e053b325755 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
I think suggestions in this mailing list are useful= , Josh, that's why I keep asking questions. I'm sorry that I'm = asking so many questions but I'm trying to improve my knowledge of Accu= mulo and documentation is limited. Ideally, I would like to use only the me= trics provided by Accumulo, because that's less stuff that I have to ma= intain. The StopWatch writing to the tracer could help.

<= br>

On= Sun, Aug 28, 2016 at 10:14 PM, Josh Elser <josh.elser@gmail.com>= ; wrote:
I know it's not a sup= er-helpful response, but I would love to help you work through things we *c= an* expose and help you do that.

I imagine there is significantly more that we can add into the dist-tracing= information for BatchScanners now which would give more insight into the t= server (amount of data read, number of ranges per scan RPC, amount of data = returned). This would be ideal as it would prevent you from having to updat= e your application code (although, the suggestion of writing some iterator = for timing purposes is a simple way to move forward)

Mario Pastorelli wrote:
I would like to understand the performance of a batch scan and I would
like to have some hints on how to proceed. I have enabled the
distributed trace, and it tells me that some batch scanner threads take
much more time than others to complete but this is not helpful enough
because it's not telling me why some threads take more. My gut feeling<= br> is that one batch thread is scanning more data than the others, which
means that the data is not well distributed for a query, but I use a
random shard byte as prefix of the keys which should guarantee that data of the same range is almost equally distributed among the tservers. I
enabled JMX on the tservers and attached jvisualvm to get an idea of the state of each tserver but I couldn't find anything meaningful. I would<= br> like to know if there is a way to profile what's going on on a single tserver for a single scan thread and by this I mean:

=C2=A01. where are the tablets required by a scan? Which tablet server?
=C2=A02. how fast was the lookups on the index for that scan?
=C2=A03. how many bytes/records were read for that scan without the iterato= rs
=C2=A04. how many seeks are done by the scan and possibly why

The main Accumulo UI is fine to get an overview of Accumulo but don't really give you any information about the performance of a single query
and it seems to me that they are heavily affected by what iterators do.
Profiling a single scan is much more interesting. Is there a way to
profile a single (batch) scan in Accumulo such that I have a complete
overview of the entire process of reading and sending back records to
the driver?

Thanks,
Mario

--
Mario Pastorelli| TERALYTICS

*software engineer*

Teralytics AG | Zollstrasse 62 | 8005 Zurich | Switzerland
phone:+41794381682
email: = mario.pastorelli@teralytics.ch
<mailto:mario.pastorelli@teralytics.ch>
= www.teralytics.net <http://www.teralytics.net/>

Company registration number: CH-020.3.037.709-7 | Trade register Canton
Zurich
Board of directors: Georg Polzer, Luciano Franceschina, Mark Schmitz,
Yann de Vries

This e-mail message contains confidential information which is for the
sole attention and use of the intended recipient. Please notify us at
once if you think that it may not be intended for you and delete it
immediately.




--
Ma= rio Pastorelli | TERALYTICS

software engineer<= /span>

Ter= alytics AG |=C2=A0Zollstrasse 62 | 8005 Zurich=C2=A0| Switzerland=C2=A0
phone: +41794381682
email: mario.pastorelli@teralytics.ch

www.teralytics.net

Company registration number: CH-020.3= .037.709-7 | Trade register Canton Zurich
Board of directors: Georg Polzer, Luc= iano Franceschina, Mark Schmitz, Yann de Vries

This e-mail message contains confidential i= nformation which is for the sole attention and use of the intended recipient. Please notify us at once = if you think that it may not be intended for you and delete it immediately.

--001a114dc7e83c5c1e053b325755--