accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <josh.el...@gmail.com>
Subject Re: sum of mutation.numBytes() significantly different from rfile size
Date Tue, 29 Oct 2013 22:02:46 GMT
GZ typically compresses text fairly well (assuming that's the 
compression codec that you're using).

I don't believe 1.4 has anything extra at the RFile level for size 
savings; however, I think that 1.5+ has some additional encoding to 
reduce the size on disk.

On 10/29/13, 5:50 PM, Slater, David M. wrote:
> Hello,
>
> I’m seeing about an order of magnitude difference between the number of
> bytes returned by mutation.numBytes() and the size of the rfiles on disk
> (Accumulo 1.4.2). Note that all of my mutations are new entries, and
> there are no combiners running.
>
> While I understand that there is some compression on the rfile, I would
> be really surprised if it was 10:1.
>
> My entries are composed of a row ID (most of which is equivalent to the
> previous row ID), an empty column family, a nonempty column qualifier
> (which likely shares a lot with the previous qualifier), and an empty
> value. An example of the rowID and column qualifier might be:
>
> (forward table)
>
> 0000000000000|9|fa19                 IP|127.000.000.001
>
> 0000000000000|9|fa19                  PORT|00080
>
> …
>
> 0000000000000|9|fa22                  IP|128.032.144.139
>
> …
>
> <timeblock>|<hash>|<uid>       <index>|<textual value>
>
> OR
>
> (reverse table)
>
> 0000000000000|IP|127.000.000.001         fa19
>
> 0000000000000|IP|127.000.000.001         fd02
>
> 0000000000000|IP|127.000.000.002         123
>
> …
>
> 0000000000000|PORT|00080                      fa19
>
> The numBytes() method appears to return a number of bytes equal to the
> string length of the row ID and column qualifiers, plus 26 * # of column
> qualifiers.
>
> Is there something else that I’m missing, or would this possibly
> compress by that much?
>
> Thanks,
>
> David
>

Mime
View raw message