flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Percy <mpe...@apache.org>
Subject Re: Can we treat a whole file as a Flume event?
Date Wed, 23 Jan 2013 21:18:27 GMT
Yep my bad, typo :)


On Wed, Jan 23, 2013 at 1:04 PM, Roshan Naik <roshan@hortonworks.com> wrote:

> Thats SpoolDirectorySource.java  .. i thought you referred to SpoolingFileSource
> earlier. i assume that was a typo ?
>
>
> On Wed, Jan 23, 2013 at 11:53 AM, Mike Percy <mpercy@apache.org> wrote:
>
>>
>> https://github.com/apache/flume/blob/trunk/flume-ng-core/src/main/java/org/apache/flume/source/SpoolDirectorySource.java
>>
>>
>> On Tue, Jan 22, 2013 at 9:23 PM, Roshan Naik <roshan@hortonworks.com>wrote:
>>
>>> Mike,
>>>    Where is the SpoolingFileSource that you are referring to  ?
>>>
>>> -roshan
>>>
>>>
>>> On Tue, Jan 22, 2013 at 6:39 PM, Mike Percy <mpercy@apache.org> wrote:
>>>
>>>> Hi Roshan,
>>>> Yep in general I'd have concerns w.r.t. capacity planning and garbage
>>>> collector behavior for large events. Flume holds at least one event batch
>>>> in memory at once, depending on # of sources/sinks, and even with a batch
>>>> size of 1 if you have unpredictably large events there is nothing
>>>> preventing an OutOfMemoryError in extreme cases. But if you plan for
>>>> capacity and test thoroughly then it can be made to work.
>>>>
>>>> Regards,
>>>> Mike
>>>>
>>>>
>>>> On Tue, Jan 22, 2013 at 3:38 PM, Roshan Naik <roshan@hortonworks.com>wrote:
>>>>
>>>>> i recall some discussion with regards to being cautious on the size of
>>>>> the events (in this case the file being moved) as flume is not quite
>>>>> intended for large events. Mike perhaps you can throw some light on that
>>>>> aspect ?
>>>>>
>>>>>
>>>>> On Tue, Jan 22, 2013 at 12:17 AM, Mike Percy <mpercy@apache.org>wrote:
>>>>>
>>>>>> Check out the latest changes to SpoolingFileSource w.r.t.
>>>>>> EventDeserializers on trunk. You can deserialize a whole file that
way if
>>>>>> you want. Whether that is a good idea depends on your use case, though.
>>>>>>
>>>>>> It's on trunk, lacking user docs for the latest changes but I will
>>>>>> try to hammer out updated docs soon. In the meantime, you can just
look at
>>>>>> the code and read the comments.
>>>>>>
>>>>>> Regards,
>>>>>> Mike
>>>>>>
>>>>>> On Monday, January 21, 2013, Nitin Pawar wrote:
>>>>>>
>>>>>>> you cant configure it to send the entire file in an event unless
you
>>>>>>> have fixed number of events in each of the files. basically it
reads the
>>>>>>> entire file into a channel and then starts writing.
>>>>>>>
>>>>>>> so as long as you can limit the events in the file, i think you
can
>>>>>>> send entire file as a transaction but not as a single event
>>>>>>> as long as I understand flume treats individual lines in the
file as
>>>>>>> event
>>>>>>>
>>>>>>> if you want to pull the entire file then you may want to implement
>>>>>>> that with messaging queues where you send an event to activemq
queue and
>>>>>>> then your consumer may pull the file in one transaction with
some other
>>>>>>> mechanism like ftp or scp or something like that
>>>>>>>
>>>>>>> others will have better idea, i am just suggesting a crude way
to
>>>>>>> get the entire file as a single event
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Jan 22, 2013 at 12:19 PM, Henry Ma <henry.ma.1986@gmail.com>wrote:
>>>>>>>
>>>>>>>> As far as I know, Directory Spooling Source will send the
file line
>>>>>>>> by line as an event, and File Roll Sink will receive these
lines and roll
>>>>>>>> up to a big file by a fixed interval. Is it right, and can
we config it to
>>>>>>>> send the whole file as an event?
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Jan 22, 2013 at 1:22 PM, Nitin Pawar <
>>>>>>>> nitinpawar432@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> why don't you use directory spooling ?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Jan 22, 2013 at 7:15 AM, Henry Ma <henry.ma.1986@gmail.com
>>>>>>>>> > wrote:
>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> When using Flume to collect log files, we want to
just COPY the
>>>>>>>>>> original files from several servers to a central
storage (unix file
>>>>>>>>>> system), not to roll up to a big file. Because we
must record some messages
>>>>>>>>>> of the original file such as name, host, path, timestamp,
etc. Besides, we
>>>>>>>>>> want to guarantee total reliability: no file miss,
no file reduplicated.
>>>>>>>>>>
>>>>>>>>>> It seems that, in Source, we must put a whole file
(size may be
>>>>>>>>>> between 100KB and 100MB) into a Flume event; and
in Sink, we must write
>>>>>>>>>> each event to a single file.
>>>>>>>>>>
>>>>>>>>>> Is it practicable? Thanks!
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Best Regards,
>>>>>>>>>> Henry Ma
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Nitin Pawar
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Best Regards,
>>>>>>>> Henry Ma
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Nitin Pawar
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message