Return-Path: Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: (qmail 27725 invoked from network); 30 Nov 2010 04:05:18 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 30 Nov 2010 04:05:18 -0000 Received: (qmail 64978 invoked by uid 500); 30 Nov 2010 04:05:14 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 64659 invoked by uid 500); 30 Nov 2010 04:05:13 -0000 Mailing-List: contact common-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-user@hadoop.apache.org Delivered-To: mailing list common-user@hadoop.apache.org Received: (qmail 64651 invoked by uid 99); 30 Nov 2010 04:05:12 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 30 Nov 2010 04:05:12 +0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of hadoopman@gmail.com designates 209.85.212.176 as permitted sender) Received: from [209.85.212.176] (HELO mail-px0-f176.google.com) (209.85.212.176) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 30 Nov 2010 04:05:03 +0000 Received: by pxi11 with SMTP id 11so1175407pxi.35 for ; Mon, 29 Nov 2010 20:04:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from :user-agent:mime-version:to:subject:content-type :content-transfer-encoding; bh=9VeYGpn7v/HS+20wynEOedtI1SjKYsVNpWYC//3/B3g=; b=CPjDWXMNetp1ikNBcgu2YC/Jx2iBUW1VmVZsfs9xGynsr92Ft312U23UNO7vEcshSR FxLZcxdJ5/k4UTX6IkCobfUkbzXNiNB28xkzPDUazPyLmOzcp0qb/NO0h9gADAqt/b19 MSHtglYYWTWHFx/oOeKE4rbkAqWcuCwti59iA= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:user-agent:mime-version:to:subject :content-type:content-transfer-encoding; b=Xl+Jkt/sdHqzVQasjeJGQ58BVkqUnNLtZM/5IVPqrJdf31AGavVjdB07CgpKODzN4R uxfWbQh/2pQVVfDqqHoRH5Zpsywsir/1rwaRXbS2HsUajBkrVS74cDstK5RJRJlvWtKT 3PpVFqQ0qlgSwL+k2JHqcHjahYWrn7EakPX4M= Received: by 10.142.174.14 with SMTP id w14mr6455554wfe.325.1291089882363; Mon, 29 Nov 2010 20:04:42 -0800 (PST) Received: from [10.20.20.233] (u235sentinel.dsl.xmission.com [166.70.240.70]) by mx.google.com with ESMTPS id w22sm8374653wfd.7.2010.11.29.20.04.40 (version=SSLv3 cipher=RC4-MD5); Mon, 29 Nov 2010 20:04:41 -0800 (PST) Message-ID: <4CF4769A.7080901@gmail.com> Date: Mon, 29 Nov 2010 20:59:22 -0700 From: hadoopman User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1.11) Gecko/20100713 Thunderbird/3.0.6 MIME-Version: 1.0 To: common-user@hadoop.apache.org Subject: HDFS Rsync process?? Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org We have two Hadoop clusters in two separate buildings. Both clusters are loading the same data from the same sources (the second cluster is for DR). We're looking at how we can recover the primary cluster and catch it back up again as new data will continue to feed into the DR cluster. It's been suggested we use rsync across the network however my concern is the amount of data we would have to copy over would take several days (at a minimum) to sync them even with our dual bonded 1 gig network cards. I'm curious if anyone has come up with a solution short of just loading the source logs into HDFS. Is there a way to even rsync two clusters and get them in sync? Been googling around. Haven't found anything of substances yet. Thanks!