accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Lasko <jla...@bbn.com>
Subject Re: Profile a (batch) scan
Date Fri, 26 Aug 2016 22:31:08 GMT
I haven't had time to dig into it yet but am hoping the Zipkin will help 
with some of these insights. (Unless that is the distributed trace you 
were referring to?)

-Jonathan

On 08/26/2016 04:54 PM, Mario Pastorelli wrote:
> I would like to understand the performance of a batch scan and I would 
> like to have some hints on how to proceed. I have enabled the 
> distributed trace, and it tells me that some batch scanner threads 
> take much more time than others to complete but this is not helpful 
> enough because it's not telling me why some threads take more. My gut 
> feeling is that one batch thread is scanning more data than the 
> others, which means that the data is not well distributed for a query, 
> but I use a random shard byte as prefix of the keys which should 
> guarantee that data of the same range is almost equally distributed 
> among the tservers. I enabled JMX on the tservers and attached 
> jvisualvm to get an idea of the state of each tserver but I couldn't 
> find anything meaningful. I would like to know if there is a way to 
> profile what's going on on a single tserver for a single scan thread 
> and by this I mean:
>
>  1. where are the tablets required by a scan? Which tablet server?
>  2. how fast was the lookups on the index for that scan?
>  3. how many bytes/records were read for that scan without the iterators
>  4. how many seeks are done by the scan and possibly why
>
> The main Accumulo UI is fine to get an overview of Accumulo but don't 
> really give you any information about the performance of a single 
> query and it seems to me that they are heavily affected by what 
> iterators do. Profiling a single scan is much more interesting. Is 
> there a way to profile a single (batch) scan in Accumulo such that I 
> have a complete overview of the entire process of reading and sending 
> back records to the driver?
>
> Thanks,
> Mario
>
> -- 
> Mario Pastorelli| TERALYTICS
>
> *software engineer*
>
> Teralytics AG | Zollstrasse 62 | 8005 Zurich | Switzerland
> phone:+41794381682
> email: mario.pastorelli@teralytics.ch 
> <mailto:mario.pastorelli@teralytics.ch>
> www.teralytics.net <http://www.teralytics.net/>
>
> Company registration number: CH-020.3.037.709-7 | Trade register 
> Canton Zurich
> Board of directors: Georg Polzer, Luciano Franceschina, Mark Schmitz, 
> Yann de Vries
>
> This e-mail message contains confidential information which is for the 
> sole attention and use of the intended recipient. Please notify us at 
> once if you think that it may not be intended for you and delete it 
> immediately.
>


Mime
View raw message