Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 46E58E960 for ; Tue, 26 Feb 2013 23:55:02 +0000 (UTC) Received: (qmail 15935 invoked by uid 500); 26 Feb 2013 23:54:57 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 15802 invoked by uid 500); 26 Feb 2013 23:54:57 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 15792 invoked by uid 99); 26 Feb 2013 23:54:57 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 26 Feb 2013 23:54:57 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of pablo@psafe.com designates 187.0.212.22 as permitted sender) Received: from [187.0.212.22] (HELO aherelay01.exch.emailtotal.com.br) (187.0.212.22) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 26 Feb 2013 23:54:48 +0000 Received: from exchange.emailtotal.com.br (unknown [187.0.212.5]) by aherelay01.exch.emailtotal.com.br (Postfix) with ESMTP id 7C83118374 for ; Tue, 26 Feb 2013 20:54:25 -0300 (BRT) Received: from [192.168.1.101] (177.135.131.66) by exchange.emailtotal.com.br (187.0.212.17) with Microsoft SMTP Server (TLS) id 8.3.83.0; Tue, 26 Feb 2013 20:53:42 -0300 Message-ID: <512D4B07.9050000@psafe.com> Date: Tue, 26 Feb 2013 20:53:43 -0300 From: Pablo Musa User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130106 Thunderbird/17.0.2 MIME-Version: 1.0 To: Subject: Re: HDFS Backup for Hadoop Update References: <512D3992.40300@psafe.com> In-Reply-To: <512D3992.40300@psafe.com> Content-Type: text/plain; charset="ISO-8859-1"; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org Following the idea of doing a copy of the data structure I thought about rsync. I could run rsync while the server is ON and later just apply the diff, which would be much faster decreasing system off-line time. But I do not know if hadoop make a lot of changes into the data structure (blocks). Thanks again, Pablo On 02/26/2013 07:39 PM, Pablo Musa wrote: > Hello guys, > I am starting the update from hadoop 0.20 to a newer version which changes > HDFS format(2.0). I read a lot of tutorials and they say that data loss is > possible (as expected). In order to avoid HDFS data loss I am will probably > backup all HDFS structure (7TB per node). However, this is a huge amount > of data and it will take a lot of time in which my service would be > unavailable. > > I was thinking about a simple approach: copying all files to a different > place. > I tried to find some parallel files compactor to fasten the process, but > could > not find it. > > How do you guys did it? > Is there some trick? > > Thank you in advance, > Pablo Musa