accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From z11373 <z11...@outlook.com>
Subject table size questions
Date Tue, 08 Sep 2015 13:19:00 GMT
I have 3 tables, all of them have same column family name, and empty column
qualifier.
For row id let say it has something like below for each table ('|' is a
delimiter char in this context).

Table1:
A|B|C

Table2:
B|C|A

Table3:
C|A|B

So as we can see above, all of them pretty much have similar content (and
actually same row id length), and they all have same number of rows (I have
verified it): 2,181,193 rows.
However, when I check their table size I found different result:
root@dev> du -h -t Table1
   17.70M [Table1]
root@dev> du -h -t Table2
   27.58M [Table2]
root@dev> du -h -t Table3
   32.48M [Table3]

I am a bit surprised to see the different results, but I realize that
Accumulo applies compression to the data. Looking at those tables size info,
am I right to conclude that A|B|C somehow seems have better compression rate
than B|C|A, which apparently is better than C|A|B?

With this fact, it makes my job a bit more difficult to tell management disk
space estimation we need to store our data in Accumulo. Earlier I was
thinking if we can guesstimate how many rows we may have in the future, and
multiply it by the factor x (and perhaps also multiply by 3 for
replication), then that's the guesstimate I can give, but now I can't even
figure out that 'x'. Does any of you have experience on this, and perhaps
can share?

Thanks,
Z



--
View this message in context: http://apache-accumulo.1065345.n5.nabble.com/table-size-questions-tp15079.html
Sent from the Developers mailing list archive at Nabble.com.

Mime
View raw message