accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Oliver Swoboda <>
Subject Comparing Schemes in Accumulo
Date Mon, 07 Nov 2016 11:54:31 GMT

I've stored weather data in two tables with different schemes. Scheme1 is
using the month and station ID for the row key (e.g. 201601_GME00102292)
and the days of the month (1-31) in the version column. Scheme2 is using
the year and station ID for the row key (e.g. 2016_GME00102292) and the
days of the year (1-366) in the version column. Of course, the version
iterator has been removed from the tables. Because I have different
metrics, like minimum temperature and maximum temperature of one day, I'm
using locality groups, one group for each metric. (e.g. setgroups
TMIN=TMIN, TMAX=TMAX). Additionaly I've done a pre splitting by year (e.g.
2014, 2015, 2016, ...).

Now to my question: If I do a full table scan with a batch scanner, Scheme2
is always faster than Scheme1 (with 2.5 billion entries Scheme1's scan took
24 minutes and Scheme2's scan took 21 minutes). Why is that? Is it because
there are fewer seeks made when using Scheme2? Would be nice if someone can
help me to understand what's happening here.

Yours faithfully,
Oliver Swoboda

View raw message