Return-Path: Delivered-To: apmail-lucene-hadoop-dev-archive@locus.apache.org Received: (qmail 99705 invoked from network); 18 Jul 2006 12:03:24 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 18 Jul 2006 12:03:24 -0000 Received: (qmail 98861 invoked by uid 500); 18 Jul 2006 12:03:24 -0000 Delivered-To: apmail-lucene-hadoop-dev-archive@lucene.apache.org Received: (qmail 98841 invoked by uid 500); 18 Jul 2006 12:03:24 -0000 Mailing-List: contact hadoop-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hadoop-dev@lucene.apache.org Delivered-To: mailing list hadoop-dev@lucene.apache.org Received: (qmail 98832 invoked by uid 99); 18 Jul 2006 12:03:24 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 18 Jul 2006 05:03:24 -0700 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received: from [209.237.227.198] (HELO brutus.apache.org) (209.237.227.198) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 18 Jul 2006 05:03:23 -0700 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id A046E41001D for ; Tue, 18 Jul 2006 12:01:15 +0000 (GMT) Message-ID: <16486340.1153224075654.JavaMail.jira@brutus> Date: Tue, 18 Jul 2006 05:01:15 -0700 (PDT) From: "Doug Cutting (JIRA)" To: hadoop-dev@lucene.apache.org Subject: [jira] Resolved: (HADOOP-341) Enhance distcp to handle *http* as a 'source protocol'. In-Reply-To: <18620957.1151927429856.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N [ http://issues.apache.org/jira/browse/HADOOP-341?page=all ] Doug Cutting resolved HADOOP-341. --------------------------------- Resolution: Fixed I just committed this. Thanks! > Enhance distcp to handle *http* as a 'source protocol'. > ------------------------------------------------------- > > Key: HADOOP-341 > URL: http://issues.apache.org/jira/browse/HADOOP-341 > Project: Hadoop > Issue Type: Improvement > Components: util > Reporter: Arun C Murthy > Assigned To: Arun C Murthy > Fix For: 0.5.0 > > Attachments: distcp.patch, distcp2.patch, distcp_input_uri.patch > > > Requirements: > Presently distcp recursively copies a directory from one dfs to another i.e. both source and destination of of the *dfs* protocol. > Enhance it to handle *http* as the source protocol i.e. support copying files from arbitrary http-based sources into the dfs. > Design: > > Follow distcp's current design: one map task per file which needs to be copied. > Caveat: distcp handles *recursive* copying by listing sub-directories; this is not as feasible with a http-based source since things like 'fancy-indexing' might not be enabled on the web-server (for all sub-locations recursively too), and even if it is enabled it will mean tedious parsing of the html served to glean the sub-directories etc. Hence the idea is to support an input file (via a -f option) which contains a list of the http-based urls which represent multiple source files. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira