Mailing-List: contact user-help@accumulo.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@accumulo.apache.org
MIME-Version: 1.0
Date: Sun, 10 Apr 2016 17:05:09 +0200
Message-ID: 
 <CACB+WM4dasjrkCR6zB-VXoR79h3iwVeaDjB9rvN=iGq9b-TvJg@mail.gmail.com>
Subject: Optimize Accumulo scan speed
From: Mario Pastorelli <mario.pastorelli@teralytics.ch>
To: user@accumulo.apache.org
Content-Type: multipart/alternative; boundary=001a114db8f807a8cd053022c195

--001a114db8f807a8cd053022c195
Content-Type: text/plain; charset=UTF-8

Hi,

I'm currently having some scan speed issues with Accumulo and I would like
to understand why and how can I solve it. I have geographical data and I
use as primary key the day and then the geohex, which is a linearisation of
lat and lon. The reason for this key is that I always query the data for
one day but for a set of geohexes with represent a zone, so with this
schema I can scan use a single scan to read all the data for one day with
few seeks. My problem is that the scan is painfully slow: for instance, to
read 5617019 rows it takes around 17 seconds and the scan speed is 13MB/s,
less than 750k scan entries/s and around 300 seeks. I enable the tracer and
this is what I've got

17325+0 Dice@srv1 Dice.query
11+1 Dice@srv1 scan 11+1 Dice@srv1 scan:location
5+13 Dice@srv1 scan 5+13 Dice@srv1 scan:location
4+19 Dice@srv1 scan 4+19 Dice@srv1 scan:location
5+23 Dice@srv1 scan 4+24 Dice@srv1 scan:location
I'm not sure how to speedup the scanning. I have the following question:
  - is this speed normal?
  - can I involve more servers in the scan? Right now only two server have
the ranges but with a cluster of 15 machines it would be nice to involve
more of them. Is it possible?

Thanks,
Mario

-- 
Mario Pastorelli | TERALYTICS

*software engineer*

Teralytics AG | Zollstrasse 62 | 8005 Zurich | Switzerland
phone: +41794381682
email: mario.pastorelli@teralytics.ch
www.teralytics.net

Company registration number: CH-020.3.037.709-7 | Trade register Canton
Zurich
Board of directors: Georg Polzer, Luciano Franceschina, Mark Schmitz, Yann
de Vries

This e-mail message contains confidential information which is for the sole
attention and use of the intended recipient. Please notify us at once if
you think that it may not be intended for you and delete it immediately.

--001a114db8f807a8cd053022c195
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div>Hi,<br><br></div>I&#39;m currently having some scan s=
peed issues with Accumulo and I would like to understand why and how can I =
solve it. I have geographical data and I use as primary key the day and the=
n the geohex, which is a linearisation of lat and lon. The reason for this =
key is that I always query the data for one day but for a set of geohexes w=
ith represent a zone, so with this schema I can scan use a single scan to r=
ead all the data for one day with few seeks. My problem is that the scan is=
 painfully slow: for instance, to read 5617019 rows it takes around 17 seco=
nds and the scan speed is 13MB/s, less than 750k scan entries/s and around =
300 seeks. I enable the tracer and this is what I&#39;ve got<br><br>17325+0
Dice@srv1
Dice.query<br>


11+1
Dice@srv1
scan


11+1
Dice@srv1
scan:location<br>


5+13
Dice@srv1
scan


5+13
Dice@srv1
scan:location<br>


4+19
Dice@srv1
scan


4+19
Dice@srv1
scan:location<br>


5+23
Dice@srv1
scan


4+24
Dice@srv1
scan:location<br><div><div>I&#39;m not sure how to speedup the scanning. I =
have the following question:<br></div><div>=C2=A0 - is this speed normal?<b=
r></div><div>=C2=A0 - can I involve more servers in the scan? Right now onl=
y two server have the ranges but with a cluster of 15 machines it would be =
nice to involve more of them. Is it possible?<br><br></div><div>Thanks,<br>=
</div><div>Mario<br></div><div><br></div><div>-- <br><div class=3D"gmail_si=
gnature"><div dir=3D"ltr"><div><div dir=3D"ltr"><div><div dir=3D"ltr"><div>=
<div dir=3D"ltr"><span style=3D"font-family:&quot;Open Sans&quot;,sans-seri=
f" lang=3D"DE"><font color=3D"#267fcf">Mario Pastorelli</font></span><span =
style=3D"font-family:&quot;Open Sans&quot;,sans-serif;color:rgb(78,78,78)" =
lang=3D"DE"> | TERA</span><span style=3D"font-family:&quot;Open Sans&quot;,=
sans-serif;color:rgb(38,127,207)" lang=3D"DE">LYTICS</span><br><div><div di=
r=3D"ltr"><p style=3D"color:rgb(80,0,80)"><span style=3D"font-size:18pt;fon=
t-family:Arial,sans-serif" lang=3D"DE"></span><b><span style=3D"font-size:9=
pt;font-family:&quot;Open Sans&quot;,sans-serif;color:rgb(78,78,78)" lang=
=3D"DE">software engineer</span></b><span style=3D"font-size:9pt;font-famil=
y:&quot;Open Sans&quot;,sans-serif" lang=3D"DE"></span></p><p style=3D"colo=
r:rgb(80,0,80)"><span style=3D"font-size:8pt;font-family:&quot;Open Sans&qu=
ot;,sans-serif;color:rgb(68,68,68)" lang=3D"DE">Teralytics AG |=C2=A0Zollst=
rasse 62 | 8005 Zurich=C2=A0| Switzerland</span><span style=3D"font-size:8p=
t;font-family:&quot;Open Sans&quot;,sans-serif;color:rgb(78,78,78)" lang=3D=
"DE">=C2=A0<br>phone:</span><span style=3D"font-size:8pt;font-family:&quot;=
Open Sans&quot;,sans-serif" lang=3D"DE"><font color=3D"#444444"> </font><fo=
nt color=3D"#3d85c6">+41794381682</font></span><span style=3D"font-size:8pt=
;font-family:&quot;Open Sans&quot;,sans-serif" lang=3D"DE"><br><font color=
=3D"#4e4e4e">email: <a href=3D"mailto:mario.pastorelli@teralytics.ch" targe=
t=3D"_blank">mario.pastorelli@teralytics.ch</a></font></span><span style=3D=
"font-size:8pt;font-family:&quot;Open Sans&quot;,sans-serif" lang=3D"DE"><b=
r><a href=3D"http://www.teralytics.net/" style=3D"color:rgb(17,85,204)" tar=
get=3D"_blank"><span style=3D"text-decoration:none"><font color=3D"#3d85c6"=
>www.teralytics.net</font></span></a></span></p><p style=3D"margin-bottom:0=
.0001pt;line-height:16pt;background-image:initial;background-repeat:initial=
"><span style=3D"color:rgb(51,51,51);font-family:Arial,sans-serif;font-size=
:8pt;line-height:16pt">Company registration number: CH-020.3.037.709-7 | Tr=
ade register Canton
Zurich<br></span><span style=3D"color:rgb(51,51,51);font-family:Arial,sans-=
serif;font-size:8pt;line-height:16pt">Board of directors: Georg Polzer, Luc=
iano Franceschina, Mark Schmitz, Yann de Vries</span></p><p style=3D"margin=
-bottom:0.0001pt;line-height:16pt;background-image:initial;background-repea=
t:initial"><span style=3D"color:rgb(51,51,51);font-family:Arial,sans-serif;=
font-size:8pt;line-height:16pt">This e-mail message contains confidential i=
nformation which is for the
sole attention and use of the intended recipient. Please notify us at once =
if
you think that it may not be intended for you and delete it immediately.</s=
pan><span style=3D"font-size:8pt;font-family:Arial,sans-serif;color:rgb(51,=
51,51)"></span></p></div></div></div></div></div></div></div></div></div></=
div>
</div></div></div>

--001a114db8f807a8cd053022c195--