hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Uma Maheswara Rao G 72686 <mahesw...@huawei.com>
Subject Re: replicate data in HDFS with smarter encoding
Date Tue, 19 Jul 2011 04:43:02 GMT
Hi,

We have already thoughts about it.

Looks like you are talking about this features right
https://issues.apache.org/jira/browse/HDFS-1640
https://issues.apache.org/jira/browse/HDFS-2115

but implementation not yet ready in trunk


Regards,
Uma
******************************************************************************************
 This email and its attachments contain confidential information from HUAWEI, which is intended
only for the person or entity whose address is listed above. Any use of the information contained
here in any way (including, but not limited to, total or partial disclosure, reproduction,
or dissemination) by persons other than the intended recipient(s) is prohibited. If you receive
this email in error, please notify the sender by phone or email immediately and delete it!
 *****************************************************************************************

----- Original Message -----
From: Da Zheng <zhengda1936@gmail.com>
Date: Tuesday, July 19, 2011 9:23 am
Subject: Re: replicate data in HDFS with smarter encoding
To: common-user@hadoop.apache.org
Cc: Joey Echeverria <joey@cloudera.com>, "hdfs-user@hadoop.apache.org" <hdfs-user@hadoop.apache.org>

> So this kind of feature is desired by the community?
> 
> It seems this implementation can only reduce the data size on the 
> disk 
> by the background daemon RaidNode, but it cannot reduce the disk 
> bandwidth and network bandwidth when the client writes data to 
> HDFS. It 
> might be more interesting to reduce the disk bandwidth and network 
> bandwidth although it might require to modify the implementation of 
> the 
> pipeline in HDFS.
> 
> Thanks,
> Da
> 
> 
> On 07/18/11 04:10, Joey Echeverria wrote:
> > Facebook contributed some code to do something similar called 
> HDFS RAID:
> >
> > http://wiki.apache.org/hadoop/HDFS-RAID
> >
> > -Joey
> >
> >
> > On Jul 18, 2011, at 3:41, Da Zheng<zhengda1936@gmail.com>  wrote:
> >
> >> Hello,
> >>
> >> It seems that data replication in HDFS is simply data copy among 
> nodes. Has
> >> anyone considered to use a better encoding to reduce the data 
> size? Say, a block
> >> of data is split into N pieces, and as long as M pieces of data 
> survive in the
> >> network, we can regenerate original data.
> >>
> >> There are many benefits to reduce the data size. It can save 
> network and disk
> >> benefit, and thus reduce energy consumption. Computation power 
> might be a
> >> concern, but we can use GPU to encode and decode.
> >>
> >> But maybe the idea is stupid or it's hard to reduce the data 
> size. I would like
> >> to hear your comments.
> >>
> >> Thanks,
> >> Da
> 
> 

Mime
View raw message