Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 0908010AB8 for ; Wed, 9 Jul 2014 16:12:05 +0000 (UTC) Received: (qmail 38415 invoked by uid 500); 9 Jul 2014 16:11:56 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 38305 invoked by uid 500); 9 Jul 2014 16:11:56 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 37863 invoked by uid 99); 9 Jul 2014 16:11:56 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 09 Jul 2014 16:11:56 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of radhakrishnan.mohan@gmail.com designates 209.85.223.177 as permitted sender) Received: from [209.85.223.177] (HELO mail-ie0-f177.google.com) (209.85.223.177) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 09 Jul 2014 16:11:53 +0000 Received: by mail-ie0-f177.google.com with SMTP id tr6so1714833ieb.8 for ; Wed, 09 Jul 2014 09:11:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=8nPwRtqvGSd1sP9LZH/k8uOIW/pw+bVN+gXMAcNa+1A=; b=HOqGFwGmUicukyEiGnkgKIYcCDBjoE5Q0paGeUZQFV4LtBfaCiiGYcCZGTRYn7LXAP z/cH8cdgy7uq17U7OxuKY9mSfVG4Z7a/KMTx0Tl3VcgEBiMb5nRzmNKevBjWJPymG5GB lsIWa6wPPrtP6FAUdMI9Yu6KqwFcy8bEs8Wd42l7ugiBv1pwNXgPtHGt83JnounWH5s/ OoTPBsRKrdDgfblGkUTVlSXt7Y1qNgjCnxOdu/TrvAEMSySRuZ9pmsXyNcoVqVpSJdOv bFtC4bxW+gqKIfQWxsblbnJDc0Krhy6ol05T4h8voDuy7a6masfAogpxRpvfa1uij1Tp O4dA== MIME-Version: 1.0 X-Received: by 10.50.221.99 with SMTP id qd3mr14300889igc.3.1404922288740; Wed, 09 Jul 2014 09:11:28 -0700 (PDT) Received: by 10.50.95.135 with HTTP; Wed, 9 Jul 2014 09:11:28 -0700 (PDT) In-Reply-To: References: Date: Wed, 9 Jul 2014 21:41:28 +0530 Message-ID: Subject: Re: Managed File Transfer From: Mohan Radhakrishnan To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=001a11346f68f54e6404fdc4f55e X-Virus-Checked: Checked by ClamAV on apache.org --001a11346f68f54e6404fdc4f55e Content-Type: text/plain; charset=UTF-8 I am a beginner. But this seems to be similar to what I intend. The data source will be external FTP or S3 storage. "Spark Streaming can read data from HDFS ,Flume , Kafka , Twitter and ZeroMQ . You can also define your own custom data sources." Thanks, Mohan On Wed, Jul 9, 2014 at 2:09 PM, Stanley Shi wrote: > There's a DistCP utility for this kind of purpose; > Also there's "Spring XD" there, but I am not sure if you want to use it. > > Regards, > *Stanley Shi,* > > > > On Mon, Jul 7, 2014 at 10:02 PM, Mohan Radhakrishnan < > radhakrishnan.mohan@gmail.com> wrote: > >> Hi, >> We used a commercial FT and scheduler tool in clustered mode. >> This was a traditional active-active cluster that supported multiple >> protocols like FTPS etc. >> >> Now I am interested in evaluating a Distributed way of crawling FTP >> sites and downloading files using Hadoop. I thought since we have to >> process thousands of files Hadoop jobs can do it. >> >> Are Hadoop jobs used for this type of file transfers ? >> >> Moreover there is a requirement for a scheduler also. What is the >> recommendation of the forum ? >> >> >> Thanks, >> Mohan >> > > --001a11346f68f54e6404fdc4f55e Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
I am a beginner. But this seems to be similar to what I intend= . The data source will be external FTP or S3 storage.

"Spa= rk Streaming can read data from=C2=A0HDFS,Flume,=C2=A0Kaf= ka,=C2=A0Twitter=C2=A0and=C2=A0ZeroMQ. You can also define your own custom data sourc= es."

Tha= nks,
Mohan<= /div>


On= Wed, Jul 9, 2014 at 2:09 PM, Stanley Shi <sshi@gopivotal.com> wrote:
There's a DistCP utilit= y for this kind of purpose;
Also there's "Spring XD" ther= e, but I am not sure if you want to use it.

Regards,
Stanley Shi,



On Mon, Jul 7, 2014 at 10:02 PM, Mohan R= adhakrishnan <radhakrishnan.mohan@gmail.com> wro= te:
Hi,
=C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0We used a commercial FT and scheduler tool in clust= ered mode. This was a traditional active-active cluster that supported mult= iple protocols like FTPS etc.

=C2=A0 =C2=A0 Now I am interested in evaluating a Distr= ibuted way of crawling FTP sites and downloading files using Hadoop. I thou= ght since we have to process thousands of files Hadoop jobs can do it.

Are Hadoop jobs used for this type of file transfers ?<= br>

Moreover there is a requirement for a schedule= r =C2=A0also. What is the recommendation of the forum ?


Thanks,
Mohan


--001a11346f68f54e6404fdc4f55e--