accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mario Pastorelli <>
Subject Optimize Accumulo scan speed
Date Sun, 10 Apr 2016 15:05:09 GMT

I'm currently having some scan speed issues with Accumulo and I would like
to understand why and how can I solve it. I have geographical data and I
use as primary key the day and then the geohex, which is a linearisation of
lat and lon. The reason for this key is that I always query the data for
one day but for a set of geohexes with represent a zone, so with this
schema I can scan use a single scan to read all the data for one day with
few seeks. My problem is that the scan is painfully slow: for instance, to
read 5617019 rows it takes around 17 seconds and the scan speed is 13MB/s,
less than 750k scan entries/s and around 300 seeks. I enable the tracer and
this is what I've got

17325+0 Dice@srv1 Dice.query
11+1 Dice@srv1 scan 11+1 Dice@srv1 scan:location
5+13 Dice@srv1 scan 5+13 Dice@srv1 scan:location
4+19 Dice@srv1 scan 4+19 Dice@srv1 scan:location
5+23 Dice@srv1 scan 4+24 Dice@srv1 scan:location
I'm not sure how to speedup the scanning. I have the following question:
  - is this speed normal?
  - can I involve more servers in the scan? Right now only two server have
the ranges but with a cluster of 15 machines it would be nice to involve
more of them. Is it possible?


Mario Pastorelli | TERALYTICS

*software engineer*

Teralytics AG | Zollstrasse 62 | 8005 Zurich | Switzerland
phone: +41794381682

Company registration number: CH- | Trade register Canton
Board of directors: Georg Polzer, Luciano Franceschina, Mark Schmitz, Yann
de Vries

This e-mail message contains confidential information which is for the sole
attention and use of the intended recipient. Please notify us at once if
you think that it may not be intended for you and delete it immediately.

View raw message