Return-Path: X-Original-To: apmail-hadoop-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B4255100E3 for ; Fri, 12 Jul 2013 21:44:20 +0000 (UTC) Received: (qmail 59947 invoked by uid 500); 12 Jul 2013 21:44:16 -0000 Delivered-To: apmail-hadoop-user-archive@hadoop.apache.org Received: (qmail 59845 invoked by uid 500); 12 Jul 2013 21:44:15 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 59838 invoked by uid 99); 12 Jul 2013 21:44:15 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 12 Jul 2013 21:44:15 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of andrea.zonca@gmail.com designates 209.85.128.179 as permitted sender) Received: from [209.85.128.179] (HELO mail-ve0-f179.google.com) (209.85.128.179) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 12 Jul 2013 21:44:10 +0000 Received: by mail-ve0-f179.google.com with SMTP id d10so8650763vea.38 for ; Fri, 12 Jul 2013 14:43:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:from:date:message-id:subject:to:content-type; bh=BEr8uJlhmHs3qnE0s/VT9xb7gbZclU9ebXXepJWlWaQ=; b=aKMtq1kgFvlg1ttTEHnXSZFhVmkWkpcp9FnXC57VCsFYpNdMlBPjkwOCD+2wqQ//3g fPNDKHmeMCd0ksNq/8tf8OofR3X9+tBMC5rmEwhrrp+VDQPamREzd3wezMl5TBWbbF6j LnPQmlQnpRNgSflALCg9QUtRFDp/46K/iqIp9yJWKXSQcQk2RuDC4pws2OwZWzt7ktIH 2NSI6nDBvjt5f3flN/V8PHezDarGdaSCIY1QEjrkF7KRzQ2f/Y6bxeNUQ7u1rEblNyUU dRMmGOrAQKlk9Mw2Epq542CGhsMxcLNZ0XchHv3I1ldmTJQTYthcHadLyxCKyGpQ06YR aJqA== X-Received: by 10.220.173.72 with SMTP id o8mr25155670vcz.75.1373665429675; Fri, 12 Jul 2013 14:43:49 -0700 (PDT) MIME-Version: 1.0 Received: by 10.220.253.202 with HTTP; Fri, 12 Jul 2013 14:43:19 -0700 (PDT) From: andrea zonca Date: Fri, 12 Jul 2013 23:43:19 +0200 Message-ID: Subject: Running hadoop for processing sources in full sky maps To: user@hadoop.apache.org Content-Type: text/plain; charset=ISO-8859-1 X-Virus-Checked: Checked by ClamAV on apache.org Hi, I have few tens of full sky maps, in binary format (FITS) of about 600MB each. For each sky map I already have a catalog of the position of few thousand sources, i.e. stars, galaxies, radio sources. For each source I would like to: open the full sky map extract the relevant section, typically 20MB or less run some statistics on them aggregate the outputs to a catalog I would like to run hadoop, possibly using python via the streaming interface, to process them in parallel. I think the input to the mapper should be each record of the catalogs, then the python mapper can open the full sky map, do the processing and print the output to stdout. Is this a reasonable approach? If so, I need to be able to configure hadoop so that a full sky map is copied locally to the nodes that are processing one of its sources. How can I achieve that? Also, what is the best way to feed the input data to hadoop? for each source I have a reference to the full sky map, latitude and longitude Thanks, I posted this question on StackOverflow: http://stackoverflow.com/questions/17617654/running-hadoop-for-processing-sources-in-full-sky-maps Regards, Andrea Zonca