accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From z11373 <>
Subject table size questions
Date Tue, 08 Sep 2015 13:19:00 GMT
I have 3 tables, all of them have same column family name, and empty column
For row id let say it has something like below for each table ('|' is a
delimiter char in this context).




So as we can see above, all of them pretty much have similar content (and
actually same row id length), and they all have same number of rows (I have
verified it): 2,181,193 rows.
However, when I check their table size I found different result:
root@dev> du -h -t Table1
   17.70M [Table1]
root@dev> du -h -t Table2
   27.58M [Table2]
root@dev> du -h -t Table3
   32.48M [Table3]

I am a bit surprised to see the different results, but I realize that
Accumulo applies compression to the data. Looking at those tables size info,
am I right to conclude that A|B|C somehow seems have better compression rate
than B|C|A, which apparently is better than C|A|B?

With this fact, it makes my job a bit more difficult to tell management disk
space estimation we need to store our data in Accumulo. Earlier I was
thinking if we can guesstimate how many rows we may have in the future, and
multiply it by the factor x (and perhaps also multiply by 3 for
replication), then that's the guesstimate I can give, but now I can't even
figure out that 'x'. Does any of you have experience on this, and perhaps
can share?


View this message in context:
Sent from the Developers mailing list archive at

View raw message