accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <josh.el...@gmail.com>
Subject Re: Profile a (batch) scan
Date Mon, 29 Aug 2016 15:06:17 GMT
Right, no, I understand.

I just meant that the metrics we do expose are lacking. It would be a 
huge benefit to everyone if we could find more things that we can 
expose, do that once, and then prevent the need for the next person to 
hand-roll some things like you are doing now :)

Mario Pastorelli wrote:
> I think suggestions in this mailing list are useful, Josh, that's why I
> keep asking questions. I'm sorry that I'm asking so many questions but
> I'm trying to improve my knowledge of Accumulo and documentation is
> limited. Ideally, I would like to use only the metrics provided by
> Accumulo, because that's less stuff that I have to maintain. The
> StopWatch writing to the tracer could help.
>
>
>
> On Sun, Aug 28, 2016 at 10:14 PM, Josh Elser <josh.elser@gmail.com
> <mailto:josh.elser@gmail.com>> wrote:
>
>     I know it's not a super-helpful response, but I would love to help
>     you work through things we *can* expose and help you do that.
>
>     I imagine there is significantly more that we can add into the
>     dist-tracing information for BatchScanners now which would give more
>     insight into the tserver (amount of data read, number of ranges per
>     scan RPC, amount of data returned). This would be ideal as it would
>     prevent you from having to update your application code (although,
>     the suggestion of writing some iterator for timing purposes is a
>     simple way to move forward)
>
>     Mario Pastorelli wrote:
>
>         I would like to understand the performance of a batch scan and I
>         would
>         like to have some hints on how to proceed. I have enabled the
>         distributed trace, and it tells me that some batch scanner
>         threads take
>         much more time than others to complete but this is not helpful
>         enough
>         because it's not telling me why some threads take more. My gut
>         feeling
>         is that one batch thread is scanning more data than the others,
>         which
>         means that the data is not well distributed for a query, but I use a
>         random shard byte as prefix of the keys which should guarantee
>         that data
>         of the same range is almost equally distributed among the
>         tservers. I
>         enabled JMX on the tservers and attached jvisualvm to get an
>         idea of the
>         state of each tserver but I couldn't find anything meaningful. I
>         would
>         like to know if there is a way to profile what's going on on a
>         single
>         tserver for a single scan thread and by this I mean:
>
>           1. where are the tablets required by a scan? Which tablet server?
>           2. how fast was the lookups on the index for that scan?
>           3. how many bytes/records were read for that scan without the
>         iterators
>           4. how many seeks are done by the scan and possibly why
>
>         The main Accumulo UI is fine to get an overview of Accumulo but
>         don't
>         really give you any information about the performance of a
>         single query
>         and it seems to me that they are heavily affected by what
>         iterators do.
>         Profiling a single scan is much more interesting. Is there a way to
>         profile a single (batch) scan in Accumulo such that I have a
>         complete
>         overview of the entire process of reading and sending back
>         records to
>         the driver?
>
>         Thanks,
>         Mario
>
>         --
>         Mario Pastorelli| TERALYTICS
>
>         *software engineer*
>
>         Teralytics AG | Zollstrasse 62 | 8005 Zurich | Switzerland
>         phone:+41794381682 <tel:%2B41794381682>
>         email: mario.pastorelli@teralytics.ch
>         <mailto:mario.pastorelli@teralytics.ch>
>         <mailto:mario.pastorelli@teralytics.ch
>         <mailto:mario.pastorelli@teralytics.ch>>
>         www.teralytics.net <http://www.teralytics.net>
>         <http://www.teralytics.net/>
>
>         Company registration number: CH-020.3.037.709-7 | Trade register
>         Canton
>         Zurich
>         Board of directors: Georg Polzer, Luciano Franceschina, Mark
>         Schmitz,
>         Yann de Vries
>
>         This e-mail message contains confidential information which is
>         for the
>         sole attention and use of the intended recipient. Please notify
>         us at
>         once if you think that it may not be intended for you and delete it
>         immediately.
>
>
>
>
> --
> Mario Pastorelli| TERALYTICS
>
> *software engineer*
>
> Teralytics AG | Zollstrasse 62 | 8005 Zurich | Switzerland
> phone:+41794381682
> email: mario.pastorelli@teralytics.ch
> <mailto:mario.pastorelli@teralytics.ch>
> www.teralytics.net <http://www.teralytics.net/>
>
> Company registration number: CH-020.3.037.709-7 | Trade register Canton
> Zurich
> Board of directors: Georg Polzer, Luciano Franceschina, Mark Schmitz,
> Yann de Vries
>
> This e-mail message contains confidential information which is for the
> sole attention and use of the intended recipient. Please notify us at
> once if you think that it may not be intended for you and delete it
> immediately.
>

Mime
View raw message