Return-Path: X-Original-To: apmail-hadoop-common-user-archive@www.apache.org Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 5E18A72A0 for ; Mon, 18 Jul 2011 11:11:07 +0000 (UTC) Received: (qmail 36945 invoked by uid 500); 18 Jul 2011 11:11:03 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 35768 invoked by uid 500); 18 Jul 2011 11:10:56 -0000 Mailing-List: contact common-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-user@hadoop.apache.org Delivered-To: mailing list common-user@hadoop.apache.org Received: (qmail 35751 invoked by uid 99); 18 Jul 2011 11:10:55 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 18 Jul 2011 11:10:55 +0000 X-ASF-Spam-Status: No, hits=2.9 required=5.0 tests=FS_REPLICA,MIME_QP_LONG_LINE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of joey@cloudera.com designates 209.85.216.48 as permitted sender) Received: from [209.85.216.48] (HELO mail-qw0-f48.google.com) (209.85.216.48) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 18 Jul 2011 11:10:45 +0000 Received: by qwj9 with SMTP id 9so2112552qwj.35 for ; Mon, 18 Jul 2011 04:10:24 -0700 (PDT) Received: by 10.229.17.17 with SMTP id q17mr4688150qca.154.1310987424740; Mon, 18 Jul 2011 04:10:24 -0700 (PDT) Received: from [192.168.1.2] (pool-108-12-191-66.bltmmd.fios.verizon.net [108.12.191.66]) by mx.google.com with ESMTPS id q12sm2336705qca.45.2011.07.18.04.10.22 (version=TLSv1/SSLv3 cipher=OTHER); Mon, 18 Jul 2011 04:10:23 -0700 (PDT) References: <4E23E394.3080401@gmail.com> In-Reply-To: <4E23E394.3080401@gmail.com> Mime-Version: 1.0 (iPad Mail 8G4) Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=us-ascii Message-Id: Cc: "hdfs-user@hadoop.apache.org" , "common-user@hadoop.apache.org" X-Mailer: iPad Mail (8G4) From: Joey Echeverria Subject: Re: replicate data in HDFS with smarter encoding Date: Mon, 18 Jul 2011 07:10:24 -0400 To: "common-user@hadoop.apache.org" X-Virus-Checked: Checked by ClamAV on apache.org Facebook contributed some code to do something similar called HDFS RAID: http://wiki.apache.org/hadoop/HDFS-RAID -Joey On Jul 18, 2011, at 3:41, Da Zheng wrote: > Hello, >=20 > It seems that data replication in HDFS is simply data copy among nodes. Ha= s > anyone considered to use a better encoding to reduce the data size? Say, a= block > of data is split into N pieces, and as long as M pieces of data survive in= the > network, we can regenerate original data. >=20 > There are many benefits to reduce the data size. It can save network and d= isk > benefit, and thus reduce energy consumption. Computation power might be a > concern, but we can use GPU to encode and decode. >=20 > But maybe the idea is stupid or it's hard to reduce the data size. I would= like > to hear your comments. >=20 > Thanks, > Da