accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mario Pastorelli <>
Subject Profile a (batch) scan
Date Fri, 26 Aug 2016 20:54:33 GMT
I would like to understand the performance of a batch scan and I would like
to have some hints on how to proceed. I have enabled the distributed trace,
and it tells me that some batch scanner threads take much more time than
others to complete but this is not helpful enough because it's not telling
me why some threads take more. My gut feeling is that one batch thread is
scanning more data than the others, which means that the data is not well
distributed for a query, but I use a random shard byte as prefix of the
keys which should guarantee that data of the same range is almost equally
distributed among the tservers. I enabled JMX on the tservers and attached
jvisualvm to get an idea of the state of each tserver but I couldn't find
anything meaningful. I would like to know if there is a way to profile
what's going on on a single tserver for a single scan thread and by this I

   1. where are the tablets required by a scan? Which tablet server?
   2. how fast was the lookups on the index for that scan?
   3. how many bytes/records were read for that scan without the iterators
   4. how many seeks are done by the scan and possibly why

The main Accumulo UI is fine to get an overview of Accumulo but don't
really give you any information about the performance of a single query and
it seems to me that they are heavily affected by what iterators do.
Profiling a single scan is much more interesting. Is there a way to profile
a single (batch) scan in Accumulo such that I have a complete overview of
the entire process of reading and sending back records to the driver?


Mario Pastorelli | TERALYTICS

*software engineer*

Teralytics AG | Zollstrasse 62 | 8005 Zurich | Switzerland
phone: +41794381682

Company registration number: CH- | Trade register Canton
Board of directors: Georg Polzer, Luciano Franceschina, Mark Schmitz, Yann
de Vries

This e-mail message contains confidential information which is for the sole
attention and use of the intended recipient. Please notify us at once if
you think that it may not be intended for you and delete it immediately.

View raw message