accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <>
Subject Re: Profile a (batch) scan
Date Sun, 28 Aug 2016 20:14:50 GMT
I know it's not a super-helpful response, but I would love to help you 
work through things we *can* expose and help you do that.

I imagine there is significantly more that we can add into the 
dist-tracing information for BatchScanners now which would give more 
insight into the tserver (amount of data read, number of ranges per scan 
RPC, amount of data returned). This would be ideal as it would prevent 
you from having to update your application code (although, the 
suggestion of writing some iterator for timing purposes is a simple way 
to move forward)

Mario Pastorelli wrote:
> I would like to understand the performance of a batch scan and I would
> like to have some hints on how to proceed. I have enabled the
> distributed trace, and it tells me that some batch scanner threads take
> much more time than others to complete but this is not helpful enough
> because it's not telling me why some threads take more. My gut feeling
> is that one batch thread is scanning more data than the others, which
> means that the data is not well distributed for a query, but I use a
> random shard byte as prefix of the keys which should guarantee that data
> of the same range is almost equally distributed among the tservers. I
> enabled JMX on the tservers and attached jvisualvm to get an idea of the
> state of each tserver but I couldn't find anything meaningful. I would
> like to know if there is a way to profile what's going on on a single
> tserver for a single scan thread and by this I mean:
>  1. where are the tablets required by a scan? Which tablet server?
>  2. how fast was the lookups on the index for that scan?
>  3. how many bytes/records were read for that scan without the iterators
>  4. how many seeks are done by the scan and possibly why
> The main Accumulo UI is fine to get an overview of Accumulo but don't
> really give you any information about the performance of a single query
> and it seems to me that they are heavily affected by what iterators do.
> Profiling a single scan is much more interesting. Is there a way to
> profile a single (batch) scan in Accumulo such that I have a complete
> overview of the entire process of reading and sending back records to
> the driver?
> Thanks,
> Mario
> --
> Mario Pastorelli| TERALYTICS
> *software engineer*
> Teralytics AG | Zollstrasse 62 | 8005 Zurich | Switzerland
> phone:+41794381682
> email:
> <>
> <>
> Company registration number: CH- | Trade register Canton
> Zurich
> Board of directors: Georg Polzer, Luciano Franceschina, Mark Schmitz,
> Yann de Vries
> This e-mail message contains confidential information which is for the
> sole attention and use of the intended recipient. Please notify us at
> once if you think that it may not be intended for you and delete it
> immediately.

View raw message