hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joey Echeverria <j...@cloudera.com>
Subject Re: replicate data in HDFS with smarter encoding
Date Mon, 18 Jul 2011 11:10:24 GMT
Facebook contributed some code to do something similar called HDFS RAID:

http://wiki.apache.org/hadoop/HDFS-RAID

-Joey


On Jul 18, 2011, at 3:41, Da Zheng <zhengda1936@gmail.com> wrote:

> Hello,
> 
> It seems that data replication in HDFS is simply data copy among nodes. Has
> anyone considered to use a better encoding to reduce the data size? Say, a block
> of data is split into N pieces, and as long as M pieces of data survive in the
> network, we can regenerate original data.
> 
> There are many benefits to reduce the data size. It can save network and disk
> benefit, and thus reduce energy consumption. Computation power might be a
> concern, but we can use GPU to encode and decode.
> 
> But maybe the idea is stupid or it's hard to reduce the data size. I would like
> to hear your comments.
> 
> Thanks,
> Da

Mime
View raw message