Return-Path: Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: (qmail 66293 invoked from network); 3 Nov 2010 19:18:20 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 3 Nov 2010 19:18:20 -0000 Received: (qmail 87867 invoked by uid 500); 3 Nov 2010 19:18:49 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 87745 invoked by uid 500); 3 Nov 2010 19:18:48 -0000 Mailing-List: contact common-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-user@hadoop.apache.org Delivered-To: mailing list common-user@hadoop.apache.org Received: (qmail 87735 invoked by uid 99); 3 Nov 2010 19:18:47 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 03 Nov 2010 19:18:47 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of smith.d.jason@gmail.com designates 209.85.216.48 as permitted sender) Received: from [209.85.216.48] (HELO mail-qw0-f48.google.com) (209.85.216.48) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 03 Nov 2010 19:18:40 +0000 Received: by qwk3 with SMTP id 3so385054qwk.35 for ; Wed, 03 Nov 2010 12:18:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:date:message-id :subject:from:to:content-type; bh=ebcKWl71xY3GFcrU9Bti0YDj3nmPbhkzzeIBqmVQyXk=; b=csqHBthGdhsCx7YJXOfpFvR9LO3Jlg1xB4NGlwiCK/ca8xSm8L0lFN5pe4QpAuSKGG TZbu/H746cxlMesEVFv8IuqKRQGizI4/gf3/++esgyT6xLNe+/NciKCHFNHbAYoHZ1zg ecyAHPt57j9EhlGt+ZIPDV85rlJlen79xQyGM= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:date:message-id:subject:from:to:content-type; b=gpb7ofmDm1NYpT6q3TRWIB0A7FHH7H7XxY567lbo0kFFKyewsXfNjAqeiK1394TTEY mhlyTqfDFAoOVtdT+jibnLcYeUWT2fRpMZ40wFxtC90XBCWm/ZQDfPdiABktUYctASwY 6DS8EInueqPs8ulTYUmWMnW4RqEgpQEA0t4Tw= MIME-Version: 1.0 Received: by 10.224.193.195 with SMTP id dv3mr10463093qab.180.1288811898522; Wed, 03 Nov 2010 12:18:18 -0700 (PDT) Received: by 10.229.16.14 with HTTP; Wed, 3 Nov 2010 12:18:18 -0700 (PDT) Date: Wed, 3 Nov 2010 15:18:18 -0400 Message-ID: Subject: Any projects to help with running MapReduce across physically distributed clusters? From: Jason Smith To: common-user@hadoop.apache.org Content-Type: multipart/alternative; boundary=20cf300fb2f564ca7604942ae78f X-Virus-Checked: Checked by ClamAV on apache.org --20cf300fb2f564ca7604942ae78f Content-Type: text/plain; charset=ISO-8859-1 I am looking into the problem of running jobs to generate statistics across a large data set that would be split into different clusters geographically. Each cluster would have a unique piece of the overall data set, as the network overhead to collocate the data would be too much. I tried searching around for any tools that might help orchestrate something like this, but did not find anything. Are there any tools I'm missing that I should look into to? Thanks Jason --20cf300fb2f564ca7604942ae78f--