Return-Path: Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: (qmail 9849 invoked from network); 7 Jul 2009 11:02:28 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 7 Jul 2009 11:02:28 -0000 Received: (qmail 57607 invoked by uid 500); 7 Jul 2009 11:02:36 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 57535 invoked by uid 500); 7 Jul 2009 11:02:35 -0000 Mailing-List: contact common-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-user@hadoop.apache.org Delivered-To: mailing list common-user@hadoop.apache.org Received: (qmail 57525 invoked by uid 99); 7 Jul 2009 11:02:35 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 07 Jul 2009 11:02:35 +0000 X-ASF-Spam-Status: No, hits=-4.0 required=10.0 tests=RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of forsberg@opera.com designates 213.236.208.81 as permitted sender) Received: from [213.236.208.81] (HELO smtp.opera.com) (213.236.208.81) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 07 Jul 2009 11:02:26 +0000 Received: from eponine.linkoping.osa (pat-tdc.opera.com [213.236.208.22]) (authenticated bits=0) by smtp.opera.com (8.13.4/8.13.4/Debian-3sarge3) with ESMTP id n67B237m003055 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES128-SHA bits=128 verify=NOT) for ; Tue, 7 Jul 2009 11:02:04 GMT Date: Tue, 7 Jul 2009 13:02:02 +0200 From: Erik Forsberg To: common-user@hadoop.apache.org Subject: Copy files https -> HDFS Message-ID: <20090707130202.26b703b6@eponine.linkoping.osa> X-Mailer: Claws Mail 3.6.1 (GTK+ 2.16.1; i486-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org Hi! I have a list of files that reside on an https server (which require authentication, either username/password or a client certificate), which I want to copy into HDFS for later Map/Reduce processing. It's a bunch of rather large files, so I'd like to do it in parallel. I would guess this has been done before? Is there example code anywhere? I can imagine creating a mapper-only job with a list of files as input, but how do I easily write to HDFS from a mapper? Thanks, \EF