Return-Path: X-Original-To: apmail-flume-user-archive@www.apache.org Delivered-To: apmail-flume-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 74F3A1157F for ; Wed, 23 Apr 2014 13:48:46 +0000 (UTC) Received: (qmail 29579 invoked by uid 500); 23 Apr 2014 13:48:45 -0000 Delivered-To: apmail-flume-user-archive@flume.apache.org Received: (qmail 29523 invoked by uid 500); 23 Apr 2014 13:48:44 -0000 Mailing-List: contact user-help@flume.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@flume.apache.org Delivered-To: mailing list user@flume.apache.org Received: (qmail 29515 invoked by uid 99); 23 Apr 2014 13:48:43 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 23 Apr 2014 13:48:43 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of otis.gospodnetic@gmail.com designates 209.85.192.42 as permitted sender) Received: from [209.85.192.42] (HELO mail-qg0-f42.google.com) (209.85.192.42) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 23 Apr 2014 13:48:39 +0000 Received: by mail-qg0-f42.google.com with SMTP id i50so961356qgf.1 for ; Wed, 23 Apr 2014 06:48:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=/SPVwSqVsqFr6heQcf0J8yokJ6haqLGhLSaQzPsJPao=; b=zBiO8jYKQ5+OtaL8l+1ni2iSmIM8QP4oYFNSROevJcBp8mzAluyPy2F1yHZfdR+96n cnCd2fllLmwKaeSZwLHKfbPWfnU+7xK4mcGL2hR6B2DFVtXz6fVxmVYzICfLk1fYLNiQ 0Q/tOxvhjm7sMtISLBsCR/DQADTegd3W5O7M84V0appfyfQdjI5Wjm6dkX2HKB5jEDcI Qx8xC7qv7K0c/q3P0dgrvWLELfYp4ogJmWaC4ETGPuTKf79cNU8w8ZQXGmiQ1JjRYYSe fBwRssc+dNKmBjHfb11qiWBIpu43Zuic9MiHyVkY/P/pzNN5UHOdgvzASmWbmOREVAI1 VJ9g== MIME-Version: 1.0 X-Received: by 10.140.38.37 with SMTP id s34mr53615068qgs.88.1398260898885; Wed, 23 Apr 2014 06:48:18 -0700 (PDT) Received: by 10.229.79.200 with HTTP; Wed, 23 Apr 2014 06:48:18 -0700 (PDT) In-Reply-To: References: Date: Wed, 23 Apr 2014 09:48:18 -0400 Message-ID: Subject: Re: Import files from a directory on remote machine From: Otis Gospodnetic To: user@flume.apache.org Content-Type: multipart/alternative; boundary=001a11c11c742e916404f7b5fc50 X-Virus-Checked: Checked by ClamAV on apache.org --001a11c11c742e916404f7b5fc50 Content-Type: text/plain; charset=UTF-8 Hi Jeff, On Thu, Apr 17, 2014 at 1:11 PM, Jeff Lord wrote: > Using the exec source with a tail -f is not considered a production > solution. > It mainly exists for testing purposes. > This statement surprised me. Is that the general consensus among Flume developers or users or at Cloudera? Is there an alternative recommended for production that provides equivalent functionality? Thanks, Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr & Elasticsearch Support * http://sematext.com/ > > > On Thu, Apr 17, 2014 at 7:03 AM, Laurance George < > laurance.w.george@gmail.com> wrote: > >> If you can NFS mount that directory to your local machine with flume it >> sounds like what you've listed out would work well. >> >> >> On Thu, Apr 17, 2014 at 2:54 AM, Something Something < >> mailinglists19@gmail.com> wrote: >> >>> If I am going to 'rsync' a file from remote host & copy it to hdfs via >>> Flume, then why use Flume? I can rsync & then just do a 'hadoop fs -put', >>> no? I must be missing something. I guess, the only benefit of using Flume >>> is that I can add Interceptors if I want to. Current requirements don't >>> need that. We just want to copy data as is. >>> >>> Here's the real use case: An application is writing to xyz.log file. >>> Once this file gets over certain size it gets rolled over to xyz1.log & so >>> on. Kinda like Log4j. What we really want is as soon as a line gets >>> written to xyz.log, it should go to HDFS via Flume. >>> >>> Can I do something like this? >>> >>> 1) Share the log directory under Linux. >>> 2) Use >>> test1.sources.mylog.type = exec >>> test1.sources.mylog.command = tail -F /home/user1/shares/logs/xyz.log >>> >>> I believe this will work, but is this the right way? Thanks for your >>> help. >>> >>> >>> >>> >>> >>> On Wed, Apr 16, 2014 at 5:51 PM, Laurance George < >>> laurance.w.george@gmail.com> wrote: >>> >>>> Agreed with Jeff. Rsync + cron ( if it needs to be regular) is >>>> probably your best bet to ingest files from a remote machine that you only >>>> have read access to. But then again you're sorta stepping outside of the >>>> use case of flume at some level here as rsync is now basically a part of >>>> your flume topology. However, if you just need to back-fill old log data >>>> then this is perfect! In fact, it's what I do myself. >>>> >>>> >>>> On Wed, Apr 16, 2014 at 8:46 PM, Jeff Lord wrote: >>>> >>>>> The spooling directory source runs as part of the agent. >>>>> The source also needs write access to the files as it renames them >>>>> upon completion of ingest. Perhaps you could use rsync to copy the files >>>>> somewhere that you have write access to? >>>>> >>>>> >>>>> On Wed, Apr 16, 2014 at 5:26 PM, Something Something < >>>>> mailinglists19@gmail.com> wrote: >>>>> >>>>>> Thanks Jeff. This is useful. Can the spoolDir be on a different >>>>>> machine? We may have to setup a different process to copy files into >>>>>> 'spoolDir', right? Note: We have 'read only' access to these files. Any >>>>>> recommendations about this? >>>>>> >>>>>> >>>>>> On Wed, Apr 16, 2014 at 5:16 PM, Jeff Lord wrote: >>>>>> >>>>>>> http://flume.apache.org/FlumeUserGuide.html#spooling-directory-source >>>>>>> >>>>>>> >>>>>>> On Wed, Apr 16, 2014 at 5:14 PM, Something Something < >>>>>>> mailinglists19@gmail.com> wrote: >>>>>>> >>>>>>>> Hello, >>>>>>>> >>>>>>>> Needless to say I am newbie to Flume, but I've got a basic flow >>>>>>>> working in which I am importing a log file from my linux box to hdfs. I am >>>>>>>> using >>>>>>>> >>>>>>>> a1.sources.r1.command = tail -F /var/log/xyz.log >>>>>>>> >>>>>>>> which is working like a stream of messages. This is good! >>>>>>>> >>>>>>>> Now what I want to do is copy log files from a directory on a >>>>>>>> remote machine on a regular basis. For example: >>>>>>>> >>>>>>>> username@machinename:/var/log/logdir/ >>>>>>>> >>>>>>>> One way to do it is to simply 'scp' files from the remote directory >>>>>>>> into my box on a regular basis, but what's the best way to do this in >>>>>>>> Flume? Please let me know. >>>>>>>> >>>>>>>> Thanks for the help. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>>> >>>> -- >>>> Laurance George >>>> >>> >>> >> >> >> -- >> Laurance George >> > > --001a11c11c742e916404f7b5fc50 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Hi Jeff,

On Thu, Apr 17, 2014 at 1:11 PM, J= eff Lord <jlord@cloudera.com> wrote:
Using the exec source with a tail -f is n= ot considered a production solution.
It mainly exists for testing purposes.

This statement surprised me. =C2=A0Is that the general consensus am= ong Flume developers or users or at Cloudera?

= Is there an alternative recommended for production that provides equivalent= functionality?

Thanks,
Otis
--
Performance Monitoring = * Log Analytics * Search Analytics
Solr & Elasticsearch Support *=C2= =A0http://sematext.com/<= /a>



=C2=A0


On Thu, Apr 17, 2014 at 7:03 AM, Laurance George <= ;laurance.= w.george@gmail.com> wrote:
If you can NFS mount that directory to yo= ur local machine with flume it sounds like what you've listed out would= work well. =C2=A0=C2=A0


On Thu, Apr 17, 2014 at 2:54 AM, Something Something &= lt;mailinglis= ts19@gmail.com> wrote:
If I am going to= 'rsync' a file from remote host & copy it to hdfs via Flume, t= hen why use Flume?=C2=A0 I can rsync & then just do a 'hadoop fs -p= ut', no?=C2=A0 I must be missing something.=C2=A0 I guess, the only ben= efit of using Flume is that I can add Interceptors if I want to.=C2=A0 Curr= ent requirements don't need that.=C2=A0 We just want to copy data as is= .

Here's the real use case:=C2=A0=C2=A0 An application is writi= ng to xyz.log file.=C2=A0 Once this file gets over certain size it gets rol= led over to xyz1.log & so on.=C2=A0 Kinda like Log4j.=C2=A0 What we rea= lly want is as soon as a line gets written to xyz.log, it should go to HDFS= via Flume.

Can I do something like this?

1)=C2=A0 Share the log= directory under Linux.
2)=C2=A0 Use
test1.sources.mylog.type = =3D exec
test1.sources.mylog.command =3D tail -F /home/user1/shares/logs= /xyz.log

I believe this will work, but is this the right way?=C2=A0 Thanks= for your help.



=


On Wed, Apr 16, 2014 at 5:51 PM, Laurance George= <laurance.w.george@gmail.com> wrote:
Agreed with Jeff.=C2=A0 Rsync + cron ( if= it needs to be regular) is probably your best bet to ingest files from a r= emote machine that you only have read access to.=C2=A0 But then again you&#= 39;re sorta stepping outside of the use case of flume at some level here as= rsync is now basically a part of your flume topology.=C2=A0 However, if yo= u just need to back-fill old log data then this is perfect!=C2=A0 In fact, = it's what I do myself.=C2=A0


On Wed, Apr 16, 2014 at 8:46 PM, Jeff Lord <jlord@cloudera.com>= wrote:
The spooling directory source runs as par= t of the agent.
The source also needs write access to the files as it renames them upon com= pletion of ingest. Perhaps you could use rsync to copy the files somewhere = that you have write access to?


On Wed, Apr 16, 2014 at 5:26 PM, Something Something &= lt;mailinglis= ts19@gmail.com> wrote:
Thanks Jeff.=C2=A0 This is useful.=C2=A0 = Can the spoolDir be on a different machine?=C2=A0 We may have to setup a di= fferent process to copy files into 'spoolDir', right?=C2=A0 Note:= =C2=A0 We have 'read only' access to these files.=C2=A0 Any recomme= ndations about this?


On Wed, Apr 16, 2014 at 5:16 PM, Jeff Lord <jlord@cloudera.com>= wrote:


On Wed, Apr 16, 2014 at 5:14 PM, Something Something <= mailinglists1= 9@gmail.com> wrote:
Hello,

Needless to say I = am newbie to Flume, but I've got a basic flow working in which I am imp= orting a log file from my linux box to hdfs.=C2=A0 I am using

=
a1.sources.r1.command =3D tail -F /var/log/xyz.log

which = is working like a stream of messages.=C2=A0 This is good!
Now what I want to do is copy log files from a directory on a remote= machine on a regular basis.=C2=A0 For example:

username@machinename:/var/log/logdir/<multiple files>
One way to do it is to simply 'scp' files from the rem= ote directory into my box on a regular basis, but what's the best way t= o do this in Flume?=C2=A0 Please let me know.

Thanks for the help.








= --
Laurance George




= --
Laurance George


--001a11c11c742e916404f7b5fc50--