flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lin Ma <lin...@gmail.com>
Subject Re: beginner's question -- file source configuration
Date Mon, 09 Mar 2015 02:56:02 GMT
Thanks Ashish,

One further question on HDFS sink. If I configure the destination directory
on HDFS to be Year Month Day Hour, etc. pattern, Flume will put the data
event it received automatically to the related directory and created new
directory with time elapsed further? Or I have to setup some Key/Value
headers event in order for HDFS sink to recognize event time and put into
appropriate time based folder?

regards,
Lin

On Sun, Mar 8, 2015 at 6:32 PM, Ashish <paliwalashish@gmail.com> wrote:

> Your understanding is correct :)
>
> On Mon, Mar 9, 2015 at 6:54 AM, Lin Ma <linlma@gmail.com> wrote:
> > Thanks Ashish,
> >
> > Followed your guidance, and found below instructions of which have
> further
> > questions to confirm with you, it seems we need to close the files and
> never
> > touch it for Flume to process correctly, so not sure if it is good
> practice
> > that -- (1) let the application write log file in existing way, like
> hourly
> > or 5 mins pattern, (2) close and move the files to another directory as
> > input Source for Flume Agent which Flume could process as Spooling
> > Directory?
> >
> > “This source will watch the specified directory for new files, and will
> > parse events out of new files as they appear. ”
> >
> > "
> >
> > If a file is written to after being placed into the spooling directory,
> > Flume will print an error to its log file and stop processing.
> > If a file name is reused at a later time, Flume will print an error to
> its
> > log file and stop processing.
> >
> > "
> >
> > regards,
> > Lin
> >
> > On Sun, Mar 8, 2015 at 12:23 AM, Ashish <paliwalashish@gmail.com> wrote:
> >>
> >> Please look at following
> >> Spooling Directory Source
> >> [http://flume.apache.org/FlumeUserGuide.html#spooling-directory-source]
> >> and
> >> HDFS Sink (http://flume.apache.org/FlumeUserGuide.html#hdfs-sink)
> >>
> >> Spooling Directory Source need immutable files, means files should not
> >> be written to once they are being consumed. In short your application
> >> cannot write to the file being read by Flume.
> >>
> >> Log format is not an issue, as long as you don't want it to be
> >> interpreted by Flume components. Since it's log assuming single log
> >> per line with line separator at the end of line.
> >>
> >> You can also look at Exec source
> >> (http://flume.apache.org/FlumeUserGuide.html#exec-source) for tailing
> >> to a file being written by application. Documentation covers details
> >> on all the links.
> >>
> >> HTH !
> >>
> >>
> >> On Sun, Mar 8, 2015 at 12:32 PM, Lin Ma <linlma@gmail.com> wrote:
> >> > Hi Flume masters,
> >> >
> >> > I want to install Flume on a box, and consume local log file as source
> >> > and
> >> > send to remote HDFS sink. The log format is private and text (not Avro
> >> > or
> >> > JSON format).
> >> >
> >> > I am reading the guide on Flume and many advanced Source
> configuration,
> >> > wondering for the plain local log file source, any reference samples?
> >> > And
> >> > not sure if Flume could consume the local file while the application
> is
> >> > still writing the log file? Thanks.
> >> >
> >> > regards,
> >> > Lin
> >>
> >>
> >>
> >> --
> >> thanks
> >> ashish
> >>
> >> Blog: http://www.ashishpaliwal.com/blog
> >> My Photo Galleries: http://www.pbase.com/ashishpaliwal
> >
> >
>
>
>
> --
> thanks
> ashish
>
> Blog: http://www.ashishpaliwal.com/blog
> My Photo Galleries: http://www.pbase.com/ashishpaliwal
>

Mime
View raw message