hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tim Broberg <Tim.Brob...@exar.com>
Subject Re: Understanding compression in hdfs
Date Sun, 29 Jul 2012 15:40:03 GMT
What if you wrote a CompressionOutputStream class that wraps around the existing ones and outputs
a hash per <n> bytes and a CompressionInputStream that checks them? ...and a Codec that
wraps your compressors around arbitrary existing codecs.

Sounds like a bunch of work, and I'm not sure where you would store the hashes, but it would
get the data into your clutches the instant it's available.

    - Tim.

On Jul 29, 2012, at 7:41 AM, "Yaron Gonen" <yaron.gonen@gmail.com<mailto:yaron.gonen@gmail.com>>

I've created a SequeceFile.Writer with block-level compression.
I'd like to create a SHA1 hash for each block written. How do I do that? I didn't see any
way to take the compression under my control in order to know when a block is over.


The information contained in this email is intended only for the personal and confidential
use of the recipient(s) named above. The information and any attached documents contained
in this message may be Exar confidential and/or legally privileged. If you are not the intended
recipient, you are hereby notified that any review, use, dissemination or reproduction of
this message is strictly prohibited and may be unlawful. If you have received this communication
in error, please notify us immediately by return email and delete the original message.
View raw message