hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Da Zheng <zhengda1...@gmail.com>
Subject Re: replicate data in HDFS with smarter encoding
Date Tue, 19 Jul 2011 03:52:40 GMT
So this kind of feature is desired by the community?

It seems this implementation can only reduce the data size on the disk 
by the background daemon RaidNode, but it cannot reduce the disk 
bandwidth and network bandwidth when the client writes data to HDFS. It 
might be more interesting to reduce the disk bandwidth and network 
bandwidth although it might require to modify the implementation of the 
pipeline in HDFS.


On 07/18/11 04:10, Joey Echeverria wrote:
> Facebook contributed some code to do something similar called HDFS RAID:
> http://wiki.apache.org/hadoop/HDFS-RAID
> -Joey
> On Jul 18, 2011, at 3:41, Da Zheng<zhengda1936@gmail.com>  wrote:
>> Hello,
>> It seems that data replication in HDFS is simply data copy among nodes. Has
>> anyone considered to use a better encoding to reduce the data size? Say, a block
>> of data is split into N pieces, and as long as M pieces of data survive in the
>> network, we can regenerate original data.
>> There are many benefits to reduce the data size. It can save network and disk
>> benefit, and thus reduce energy consumption. Computation power might be a
>> concern, but we can use GPU to encode and decode.
>> But maybe the idea is stupid or it's hard to reduce the data size. I would like
>> to hear your comments.
>> Thanks,
>> Da

View raw message