hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brock Noland <br...@cloudera.com>
Subject Re: Understanding compression in hdfs
Date Sun, 29 Jul 2012 17:41:12 GMT
Also note that HDFS already does checksums which I believe you can retrieve:

http://hadoop.apache.org/common/docs/r1.0.3/api/org/apache/hadoop/fs/FileSystem.html#getFileChecksum(org.apache.hadoop.fs.Path)

http://hadoop.apache.org/common/docs/r1.0.3/hdfs_design.html#Data+Integrity

Brock

On Sun, Jul 29, 2012 at 12:35 PM, Yaron Gonen <yaron.gonen@gmail.com> wrote:

> Thanks!
> I'll dig into those classes to figure out my next step.
>
> Anyway, I just realized the block-level compression has nothing to do with
> HDFS blocks. An HDFS block can contain an unknown number of compressed
> blocks, which makes my efforts kind of worthless.
>
> thanks again!
>
>
> On Sun, Jul 29, 2012 at 6:40 PM, Tim Broberg <Tim.Broberg@exar.com> wrote:
>
>>  What if you wrote a CompressionOutputStream class that wraps around the
>> existing ones and outputs a hash per <n> bytes and a CompressionInputStream
>> that checks them? ...and a Codec that wraps your compressors around
>> arbitrary existing codecs.
>>
>>  Sounds like a bunch of work, and I'm not sure where you would store the
>> hashes, but it would get the data into your clutches the instant it's
>> available.
>>
>>     - Tim.
>>
>> On Jul 29, 2012, at 7:41 AM, "Yaron Gonen" <yaron.gonen@gmail.com> wrote:
>>
>>   Hi,
>> I've created a SequeceFile.Writer with block-level compression.
>> I'd like to create a SHA1 hash for each block written. How do I do that?
>> I didn't see any way to take the compression under my control in order to
>> know when a block is over.
>>
>>  Thanks,
>> Yaron
>>
>>
>> ------------------------------
>> The information contained in this email is intended only for the personal
>> and confidential use of the recipient(s) named above. The information and
>> any attached documents contained in this message may be Exar confidential
>> and/or legally privileged. If you are not the intended recipient, you are
>> hereby notified that any review, use, dissemination or reproduction of this
>> message is strictly prohibited and may be unlawful. If you have received
>> this communication in error, please notify us immediately by return email
>> and delete the original message.
>>
>
>


-- 
Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/

Mime
View raw message