flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis.gospodne...@gmail.com>
Subject Re: AWS S3 flume source
Date Fri, 01 Aug 2014 07:51:13 GMT
Hi,

On Fri, Aug 1, 2014 at 4:52 AM, Jonathan Natkins <natty@streamsets.com>
wrote:

> Hey all,
>
> I created a JIRA for this:
> https://issues.apache.org/jira/browse/FLUME-2437
>

Thanks!  Should Fix Version be set to the next Flume release version?

I thought I'd start working on one myself, which can hopefully be
> contributed back. I'm curious: do you have particular requirements? Based
> on the emails in this thread, it sounds like the original goal was to have
> something that's like a SpoolDirectorySource that just picks up new files
> from S3. Is that accurate?
>

Yes, I think so.  We need to be able to:
* fetch data (logs for pulling them in Logsene
<http://sematext.com/logsene/>) from S3 periodically (e.g. every 1 min,
every 5 min, etc.)
* fetch data from multiple S3 buckets
* associate an S3 bucket with a user/token/key
* dynamically (i.e. without editing/writing config files stored on disk)
add new S3 buckets from which data should be fetch
* dynamically (i.e. without editing/writing config files stored on disk)
stop fetching data from some S3 buckets


> Would you need to be able to pull files from multiple S3 directories with
> the same source?
>

I think the above addresses this question.


> Thanks,
> Natty
>

Thanks!

Otis
--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/



>
>
> On Thu, Jul 31, 2014 at 4:58 PM, Otis Gospodnetic <
> otis.gospodnetic@gmail.com> wrote:
>
>> +1 for seeing S3Source, starting with a JIRA issue.
>>
>> But being able to dynamically add/remove S3 buckets from which to pull
>> data seems important.
>>
>> Any suggestions for how to approach that?
>>
>> Otis
>> --
>> Performance Monitoring * Log Analytics * Search Analytics
>> Solr & Elasticsearch Support * http://sematext.com/
>>
>>
>> On Thu, Jul 31, 2014 at 9:14 PM, Hari Shreedharan <
>> hshreedharan@cloudera.com> wrote:
>>
>>> Please go ahead and file a jira. If you are willing to submit a patch,
>>> you can post it on the jira.
>>>
>>> Viral Bajaria wrote:
>>>
>>>
>>> I have a similar use case that cropped up yesterday. I saw the archive
>>> and found that there was a recommendation to build it as Sharninder
>>> suggested.
>>>
>>> For now, I went down the route of writing a python script which
>>> downloads from S3 and puts the files in a directory which is
>>> configured to be picked up via a spooldir.
>>>
>>> I would prefer to get a direct S3 source, and maybe we could
>>> collaborate on it and open-source it. Let me know if you prefer that
>>> and we can work directly on it by creating a JIRA.
>>>
>>> Thanks,
>>> Viral
>>>
>>>
>>>
>>> On Thu, Jul 31, 2014 at 10:26 AM, Hari Shreedharan
>>> <hshreedharan@cloudera.com <mailto:hshreedharan@cloudera.com>> wrote:
>>>
>>>     In both cases, Sharninder is right :)
>>>
>>>     Sharninder wrote:
>>>
>>>
>>>
>>>     As far as I know, there is no (open source) implementation of an S3
>>>     source, so yes, you'll have to implement your own. You'll have to
>>>     implement a Pollable source and the dev documentation has an outline
>>>     that you can use. You can also look at the existing Execsource and
>>>     work your way up.
>>>
>>>     As far as I know, there is no way to configure flume without
>>>     using the
>>>     configuration file.
>>>
>>>
>>>
>>>     On Thu, Jul 31, 2014 at 7:57 PM, Paweł <prog88@gmail.com
>>>     <mailto:prog88@gmail.com>
>>>     <mailto:prog88@gmail.com <mailto:prog88@gmail.com>>> wrote:
>>>
>>>         Hi,
>>>         I'm wondering if Flume is able to read directly from S3.
>>>
>>>         I'll describe my case. I have log files stored in AWS S3. I have
>>>         to fetch periodically new S3 objects and read log lines from it.
>>>         Than use log lines (events) are processed in standard flume's way
>>>         (as with other sources).
>>>
>>>         *1) Is there any way to fetch S3 objects or I have to write
>>>     my own
>>>         Source?*
>>>
>>>
>>>         There is also second case. I want to have flume configuration
>>>         dynamic. Flume sources can change in time. New AWS key and S3
>>>         bucket can be added or deleted.
>>>
>>>         *2) Is there any other way to configure Flume than by static
>>>         configuration file?*
>>>
>>>         --
>>>         Paweł Róg
>>>
>>>
>>>
>>
>

Mime
View raw message