flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis.gospodne...@gmail.com>
Subject Re: Import files from a directory on remote machine
Date Wed, 23 Apr 2014 13:48:18 GMT
Hi Jeff,

On Thu, Apr 17, 2014 at 1:11 PM, Jeff Lord <jlord@cloudera.com> wrote:

> Using the exec source with a tail -f is not considered a production
> solution.
> It mainly exists for testing purposes.
>

This statement surprised me.  Is that the general consensus among Flume
developers or users or at Cloudera?

Is there an alternative recommended for production that provides equivalent
functionality?

Thanks,
Otis
--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/





>
>
> On Thu, Apr 17, 2014 at 7:03 AM, Laurance George <
> laurance.w.george@gmail.com> wrote:
>
>> If you can NFS mount that directory to your local machine with flume it
>> sounds like what you've listed out would work well.
>>
>>
>> On Thu, Apr 17, 2014 at 2:54 AM, Something Something <
>> mailinglists19@gmail.com> wrote:
>>
>>> If I am going to 'rsync' a file from remote host & copy it to hdfs via
>>> Flume, then why use Flume?  I can rsync & then just do a 'hadoop fs -put',
>>> no?  I must be missing something.  I guess, the only benefit of using Flume
>>> is that I can add Interceptors if I want to.  Current requirements don't
>>> need that.  We just want to copy data as is.
>>>
>>> Here's the real use case:   An application is writing to xyz.log file.
>>> Once this file gets over certain size it gets rolled over to xyz1.log & so
>>> on.  Kinda like Log4j.  What we really want is as soon as a line gets
>>> written to xyz.log, it should go to HDFS via Flume.
>>>
>>> Can I do something like this?
>>>
>>> 1)  Share the log directory under Linux.
>>> 2)  Use
>>> test1.sources.mylog.type = exec
>>> test1.sources.mylog.command = tail -F /home/user1/shares/logs/xyz.log
>>>
>>> I believe this will work, but is this the right way?  Thanks for your
>>> help.
>>>
>>>
>>>
>>>
>>>
>>> On Wed, Apr 16, 2014 at 5:51 PM, Laurance George <
>>> laurance.w.george@gmail.com> wrote:
>>>
>>>> Agreed with Jeff.  Rsync + cron ( if it needs to be regular) is
>>>> probably your best bet to ingest files from a remote machine that you only
>>>> have read access to.  But then again you're sorta stepping outside of the
>>>> use case of flume at some level here as rsync is now basically a part of
>>>> your flume topology.  However, if you just need to back-fill old log data
>>>> then this is perfect!  In fact, it's what I do myself.
>>>>
>>>>
>>>> On Wed, Apr 16, 2014 at 8:46 PM, Jeff Lord <jlord@cloudera.com> wrote:
>>>>
>>>>> The spooling directory source runs as part of the agent.
>>>>> The source also needs write access to the files as it renames them
>>>>> upon completion of ingest. Perhaps you could use rsync to copy the files
>>>>> somewhere that you have write access to?
>>>>>
>>>>>
>>>>> On Wed, Apr 16, 2014 at 5:26 PM, Something Something <
>>>>> mailinglists19@gmail.com> wrote:
>>>>>
>>>>>> Thanks Jeff.  This is useful.  Can the spoolDir be on a different
>>>>>> machine?  We may have to setup a different process to copy files
into
>>>>>> 'spoolDir', right?  Note:  We have 'read only' access to these files.
 Any
>>>>>> recommendations about this?
>>>>>>
>>>>>>
>>>>>> On Wed, Apr 16, 2014 at 5:16 PM, Jeff Lord <jlord@cloudera.com>wrote:
>>>>>>
>>>>>>> http://flume.apache.org/FlumeUserGuide.html#spooling-directory-source
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Apr 16, 2014 at 5:14 PM, Something Something <
>>>>>>> mailinglists19@gmail.com> wrote:
>>>>>>>
>>>>>>>> Hello,
>>>>>>>>
>>>>>>>> Needless to say I am newbie to Flume, but I've got a basic
flow
>>>>>>>> working in which I am importing a log file from my linux
box to hdfs.  I am
>>>>>>>> using
>>>>>>>>
>>>>>>>> a1.sources.r1.command = tail -F /var/log/xyz.log
>>>>>>>>
>>>>>>>> which is working like a stream of messages.  This is good!
>>>>>>>>
>>>>>>>> Now what I want to do is copy log files from a directory
on a
>>>>>>>> remote machine on a regular basis.  For example:
>>>>>>>>
>>>>>>>> username@machinename:/var/log/logdir/<multiple files>
>>>>>>>>
>>>>>>>> One way to do it is to simply 'scp' files from the remote
directory
>>>>>>>> into my box on a regular basis, but what's the best way to
do this in
>>>>>>>> Flume?  Please let me know.
>>>>>>>>
>>>>>>>> Thanks for the help.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Laurance George
>>>>
>>>
>>>
>>
>>
>> --
>> Laurance George
>>
>
>

Mime
View raw message