accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Keith Turner (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ACCUMULO-980) support pluggable codecs for RFile
Date Fri, 01 Feb 2013 19:12:12 GMT

    [ https://issues.apache.org/jira/browse/ACCUMULO-980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13568980#comment-13568980
] 

Keith Turner commented on ACCUMULO-980:
---------------------------------------

Some comments on proposal V1.

It seems like the IV would be transparent to RFile, it would just be encryption header information
associated with a block.  Just like each gzip block probably has some header.  From RFiles
perspective it just needs to be able to read and write blocks of data.   When the encryption
codec is not used, there is no per block IV.  Does this sound correct?  Taking this a step
further, should encryption be pushed into BCFile?   Currently RFile has no concept of compression,
is just reads and write blocks of data to BCFile.  BCFile handles compression and stores compression
metadata like what codec to use for reading.  Even RFiles own root meta block is stored as
a regular BCFile meta block and compressed like everything else.  Seems like modifying BCfile
rather than RFile may be easier.   I have already modified BCfile to support multi level indexes
in 1.4.   BCFile was copied because it was package private, but was not modified for a long
time.

Why is another interface needed?  Why not use org.apache.hadoop.io.compress.CompressionCodec?
 Not saying we should or should not do this, but would like to hear your thoughts since you
have looked into this.  I see some things in the design doc that I suspect influence this
decision, like needed to set Key and IV.  While thinking about this I remembered the BigTable
paper mentioned using two compression codecs in series.

In the past we have not supported rolling upgrade from 1.x to 1.(x+1).  Would only need to
consider this if 1.6 supported it.   Changes in the file format would be a small part of a
larger effort to support rolling upgrade.   Releases to date could always read a file produced
by any previous version.   So Accumulo 1.4 can read rfiles produced by any previous version
of Accumulo.   

Is there any concern with storing unencrypted blocks in memory?  The code currently caches
uncompressed blocks (but still serialzed with RFile encoding) in memory.  Would this be a
concern in case these cached block are swapped out?  Would we want to keep blocks encrypted
in the cache and decrypt only as needed?  

                
> support pluggable codecs for RFile
> ----------------------------------
>
>                 Key: ACCUMULO-980
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-980
>             Project: Accumulo
>          Issue Type: Improvement
>            Reporter: Adam Fuchs
>            Assignee: Adam Fuchs
>             Fix For: 1.6.0
>
>         Attachments: RFile-Changes-Proposal-V1.pdf
>
>
> As part of the encryption at rest story, RFile should support pluggable modules where
it currently has hardcoded options for compression codecs. This is a natural place to add
encryption capabilities, as the cost of encryption would likely not be significantly different
from the cost of compression, and the block-level integration should maintain the same seek
and scan performance. Given the many implementation options for both encryption and compression,
it makes sense to have a plugin structure here.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message