Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 5D84CEA1E for ; Fri, 25 Jan 2013 07:44:53 +0000 (UTC) Received: (qmail 4681 invoked by uid 500); 25 Jan 2013 07:44:48 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 4592 invoked by uid 500); 25 Jan 2013 07:44:48 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 4568 invoked by uid 99); 25 Jan 2013 07:44:47 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 25 Jan 2013 07:44:47 +0000 X-ASF-Spam-Status: No, hits=1.7 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of nitinpawar432@gmail.com designates 209.85.220.180 as permitted sender) Received: from [209.85.220.180] (HELO mail-vc0-f180.google.com) (209.85.220.180) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 25 Jan 2013 07:44:39 +0000 Received: by mail-vc0-f180.google.com with SMTP id fo13so52647vcb.25 for ; Thu, 24 Jan 2013 23:44:19 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:content-type; bh=gBrLfwX7MxlxQlX+QylG/DJDuKJ8mQncWVMaESmRIxw=; b=OMt1t6aGP2I+bCByZELxc3NVg77Djyks5Gjjerbiiwb/sqZywuo9RIT6gYR+y4SfJm LN46hV/wFWhPkMS0uOB/uFvyetcWANik1YQSJUSMXEOHsWx9K/mt4LjFMGAQ6EY71sVo 2IMCD63cv4UeDSqaC+dVEJHoxHQmHPIf4WsrDM6S68//xVoA5pvL0YNNhqPZ5F6zy1Na I/Hkzab/riEJMb2QF3ItlDymQOIHU8R3+84ADc3O9IhLsKH93FsuJpIOPS4esIIjRY+y ePSBEbgObyFK7NNdI+SdFC7wMK5L6gTTX3t5oOfeqVgwlyJkAwo+c4LmxAm1myf+25MZ GWuw== MIME-Version: 1.0 X-Received: by 10.52.27.241 with SMTP id w17mr4365015vdg.96.1359099858893; Thu, 24 Jan 2013 23:44:18 -0800 (PST) Received: by 10.59.9.67 with HTTP; Thu, 24 Jan 2013 23:44:18 -0800 (PST) In-Reply-To: References: Date: Fri, 25 Jan 2013 13:14:18 +0530 Message-ID: Subject: Re: Copy files from remote folder to HDFS From: Nitin Pawar To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=20cf307c9c7c4dbcde04d4181883 X-Virus-Checked: Checked by ClamAV on apache.org --20cf307c9c7c4dbcde04d4181883 Content-Type: text/plain; charset=ISO-8859-1 if this is one time activity then just download hadoop binaries from apache replace the hdfs-site.xml and core-site.xml with the one you have on hadoop cluster allow this machine to connect with hadoop cluster then you can just do it with hadoop command line scripts. On Fri, Jan 25, 2013 at 1:01 PM, Mahesh Balija wrote: > Hi Panshul, > > I am also working on similar requirement, one approach is, > mount your remote folder on your hadoop master node. > And simply write a shell script to copy the files to HDFS > using crontab. > > I believe Flume is literally a wrong choice as Flume is a > data collection and aggregation framework and NOT a file transfer tool and > may NOT be a good choice when you actually want to copy the files as-is > onto your cluster (NOT 100% sure as I am also working on that). > > Thanks, > Mahesh Balija, > CalsoftLabs. > > On Fri, Jan 25, 2013 at 6:39 AM, Panshul Whisper wrote: > >> Hello, >> >> I am trying to copy files, Json files from a remote folder - (a folder on >> my local system, Cloudfiles folder or a folder on S3 server) to the HDFS of >> a cluster running at a remote location. >> The job submitting Application is based on Spring Hadoop. >> >> Can someone please suggest or point me in the right direction for best >> option to achieve the above task: >> 1. Use Spring Integration data pipelines to poll the folders for files >> and copy them to the HDFS as they arrive in the source folder. - I have >> tried to implement the solution in Spring Data book, but it does not run - >> no idea what is wrong as it does not generate logs. >> >> 2. Use some other script method to transfer files. >> >> Main requirement, I need to transfer files from a remote folder to HDFS >> everyday at a fixed time for processing in the hadoop cluster. These files >> are collecting from various sources in the remote folders. >> >> Please suggest an efficient approach. I have been searching and finding a >> lot of approaches but unable to decide what will work best. As this >> transfer needs to be as fast as possible. >> The files to be transferred will be almost 10 GB of Json files not more >> than 6kb each file. >> >> Thanking You, >> >> >> -- >> Regards, >> Ouch Whisper >> 010101010101 >> > > -- Nitin Pawar --20cf307c9c7c4dbcde04d4181883 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
if this is one time activity=A0

then ju= st download hadoop binaries from apache
replace the hdfs-si= te.xml and core-site.xml =A0with the one you have on hadoop cluster=A0
allow this machine to connect with hadoop cluster=A0
then y= ou can just do it with hadoop command line scripts.=A0


On Fri, Jan 25, 2013 a= t 1:01 PM, Mahesh Balija <balijamahesh.mca@gmail.com> wrote:
Hi Panshul,

=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0 I am also working on similar requirement, one approach is, mou= nt your remote folder on your hadoop master node.
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 And simply write a shell script to cop= y the files to HDFS using crontab.
=A0
=A0 =A0 =A0 =A0 =A0 =A0=A0 I believe Flume is literally a wrong choi= ce as Flume is=A0 a data collection and aggregation framework and NOT a fil= e transfer tool and may NOT be a good choice when you actually want to copy= the files as-is onto your cluster (NOT 100% sure as I am also working on t= hat).

Thanks,
Mahesh Balija,
CalsoftLabs.

On Fri, Jan 25, 2013 at 6:39 AM, Panshul Whisper <ouchwhisper@gmail.com> wrote:
Hello,

I am trying to copy files, Json files from a re= mote folder - (a folder on my local system, Cloudfiles folder or a folder o= n S3 server) to the HDFS of a cluster running at a remote location.=A0
The job submitting Application is based on Spring Hadoop.
Can someone please suggest or point me in the right direction = for best option to achieve the above task:
1. Use Spring Integration data pipelines to poll the folders for files and = copy them to the HDFS as they arrive in the source folder. - I have tried t= o implement the solution in Spring Data book, but it does not run - no idea= what is wrong as it does not generate logs.

2. Use some other script method to transfer files.

Main requirement, I need to transfer files from a remo= te folder to HDFS everyday at a fixed time for processing in the hadoop clu= ster. These files are collecting from various sources in the remote folders= .

Please suggest an efficient approach. I have been searc= hing and finding a lot of approaches but unable to decide what will work be= st. As this transfer needs to be as fast as possible.=A0
The files to be=A0transferred=A0will be almost 10 GB of Json files not= more than 6kb each file.

Thanking You,


--
Regards,
Ouch Whisper
010101010101




--
Nitin Pawar<= br>
--20cf307c9c7c4dbcde04d4181883--