Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
Received-SPF: pass (athena.apache.org: domain of pablo@psafe.com designates
 187.0.212.22 as permitted sender)
Message-ID: <512D4B07.9050000@psafe.com>
Date: Tue, 26 Feb 2013 20:53:43 -0300
From: Pablo Musa <pablo@psafe.com>
User-Agent: Mozilla/5.0 (X11; Linux x86_64;
 rv:17.0) Gecko/20130106 Thunderbird/17.0.2
MIME-Version: 1.0
To: <user@hadoop.apache.org>
Subject: Re: HDFS Backup for Hadoop Update
References: <512D3992.40300@psafe.com>
In-Reply-To: <512D3992.40300@psafe.com>
Content-Type: text/plain; charset="ISO-8859-1"; format=flowed
Content-Transfer-Encoding: 7bit

Following the idea of doing a copy of the data structure I thought about 
rsync.

I could run rsync while the server is ON and later just apply the diff, 
which
would be much faster decreasing system off-line time.
But I do not know if hadoop make a lot of changes into the data 
structure (blocks).

Thanks again,
Pablo

On 02/26/2013 07:39 PM, Pablo Musa wrote:
> Hello guys,
> I am starting the update from hadoop 0.20 to a newer version which changes
> HDFS format(2.0). I read a lot of tutorials and they say that data loss is
> possible (as expected). In order to avoid HDFS data loss I am will probably
> backup all HDFS structure (7TB per node). However, this is a huge amount
> of data and it will take a lot of time in which my service would be
> unavailable.
>
> I was thinking about a simple approach: copying all files to a different
> place.
> I tried to find some parallel files compactor to fasten the process, but
> could
> not find it.
>
> How do you guys did it?
> Is there some trick?
>
> Thank you in advance,
> Pablo Musa