accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Slater, David M." <>
Subject sum of mutation.numBytes() significantly different from rfile size
Date Tue, 29 Oct 2013 21:50:08 GMT

I'm seeing about an order of magnitude difference between the number of bytes returned by
mutation.numBytes() and the size of the rfiles on disk (Accumulo 1.4.2). Note that all of
my mutations are new entries, and there are no combiners running.

While I understand that there is some compression on the rfile, I would be really surprised
if it was 10:1.

My entries are composed of a row ID (most of which is equivalent to the previous row ID),
an empty column family, a nonempty column qualifier (which likely shares a lot with the previous
qualifier), and an empty value. An example of the rowID and column qualifier might be:

(forward table)
0000000000000|9|fa19                 IP|
0000000000000|9|fa19                  PORT|00080
0000000000000|9|fa22                  IP|
<timeblock>|<hash>|<uid>       <index>|<textual value>

(reverse table)
0000000000000|IP|         fa19
0000000000000|IP|         fd02
0000000000000|IP|         123
0000000000000|PORT|00080                      fa19

The numBytes() method appears to return a number of bytes equal to the string length of the
row ID and column qualifiers, plus 26 * # of column qualifiers.

Is there something else that I'm missing, or would this possibly compress by that much?


View raw message