flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From shekhar sharma <shekhar2...@gmail.com>
Subject Re: Source and Sink running on different machine
Date Fri, 29 Jun 2012 17:01:08 GMT
Hey Inder,
I would like to know is file jdbc channel is persistent channel. i have not
gone fully through this. it would be great if you can you please illustrate
more on this wrt to performance. Is it more efficient than recoverable
memory channel? I am coming from failover mechanism of a flume agent.

Thanks for ur info.

Regards,
Som

On Fri, Jun 29, 2012 at 10:24 PM, shekhar sharma <shekhar2581@gmail.com>wrote:

> Hello Mohammad,
> I am not denying your statement.Again many thanks for making my thought
> process clear. I have read somewhere that cloudera's flume support this
> (0.9.x), not sure.
>
> Now, If we are configuring multiple channels, then i dont think it has to
> be part of a single agent, conceptually.
> Let say if you have two machine machine1 and machine2.
>
> On machine 2, hadoop is running in pseudo mode.
> Machine1 is collecting events and it needs to send the events to logger
> sink (on its own machine) and also needs to send the events to HDFS sink
>
> Now if i would like to do achieve the same, the i would probably do
> something like this:
>
> I have taken the source and sink just for illustration:
>
> Machine1: AvroSource1------------>logger sink
>                                     |
>                                     |---------->Avro sink1
>
> Machine2: AvroSource2---------> HDFS sink
>
> Now if you see there are two hops
>
> AvroSource1--->channel1---> Avro Sink1--->AvroSourc2--->Channel--->HDFS
> sink
>
>
> And if i can do something like this:
>
> Machine1: AvroSource1------>loggerSink
>
> Machine2:AvroSink--->HDFS sink
>
> Now AvroSource1--->AvroSink1--->Channel--->HDFS Sink
>
> Does it make sense? I really want to try out...my system is down from the
> last 3 hours so could not try out...
> Lets see what the experts have to say.
>
> Regards,
> Som Shekhar
>
>
>
>
>
>
>
>
> On Fri, Jun 29, 2012 at 6:26 PM, Mohammad Tariq <dontariq@gmail.com>wrote:
>
>> Hello Shekhar,
>>
>>          By atomic I meant that the components of a particular agent
>> (the source, all the channels and all the sinks) together form an
>> agent. It is not possible to run the source separately from the sink.
>> But that is just my understanding. I am still new to Flume and may be
>> wrong.(Please correct me if that is the case).
>>          And as far as fanning out is considered, although a source
>> is connected to several channels and in turn to several sinks, all of
>> these are still parts of the same agent.
>>
>> Regards,
>>     Mohammad Tariq
>>
>>
>> On Fri, Jun 29, 2012 at 6:06 PM, shekhar sharma <shekhar2581@gmail.com>
>> wrote:
>> > Motivation behind such configuration is i have developed my own sink
>> more
>> > specifically for Esper ( to run EPL queries on top of flume) and if i
>> run
>> > the sink on the same machine, it might be slower, since i am running
>> complex
>> > EPL queries. so i want to dedicate a seperate machine for the sink.
>> >
>> > Well even though the connection is made, i am trying to send
>> events..once it
>> > is done i will update the same.
>> >
>> > Now as you are saying that agents are atomic in nature, then how is it
>> > possible to have fan out mechanism.(channel multiplexing) where a single
>> > source send events to different channels which in turn connected to
>> > different sink.If this is the case then all the sinks have to do be in
>> > single machine...
>> >
>> > For example:
>> > Source --->channel1--->Hdfs sink
>> >            ----> channel2--> cassandra sink
>> >
>> > Well one can argue that we can have multiple hops to the sinks.
>> >
>> > Regards,
>> > Som
>> >
>> >
>> > On Fri, Jun 29, 2012 at 4:21 PM, Inder Pall <inder.pall@gmail.com>
>> wrote:
>> >>
>> >> shouldn't multi hop setup be using their channels, why would you want
>> to
>> >> share the channel across agents?
>> >> something like
>> >>
>> >> hop1 -<AVROSOURCE| MEM CHANNEL | AVRO SINK>
>> >> hop'n' -<AVROSOURCE| MEM CHANNEL | FILE SINK>
>> >>
>> >> - inder
>> >>
>> >>
>> >> On Fri, Jun 29, 2012 at 4:10 PM, Mohammad Tariq <dontariq@gmail.com>
>> >> wrote:
>> >>>
>> >>> This will be a multi-hop setup if I am not wrong.
>> >>>
>> >>> Regards,
>> >>>     Mohammad Tariq
>> >>>
>> >>>
>> >>> On Fri, Jun 29, 2012 at 4:04 PM, Inder Pall <inder.pall@gmail.com>
>> wrote:
>> >>> > what's the sink you have given in the first agent that you have
>> setup.
>> >>> > To connect the pipelines AFAIK you need to have a
>> >>> > <Somesource><channel><AvroSink> on the first
agent and then
>> >>> > <AvroSource><channel><Whatever sink you need>
at the second agent.
>> >>> >
>> >>> > If you have configured as mentioned below then it's not what you
are
>> >>> > asking
>> >>> > ->
>> >>> >
>> >>> > If i get your question right you want source and Sink for a single
>> >>> > agent to
>> >>> > be able to run on different machines and share the same common
>> >>> > distributed
>> >>> > channel. Though i havent tested it but JDBC channel might address
>> this
>> >>> > usecase.
>> >>> >
>> >>> > Btw, it would be nice to understand the motivation behind such
a
>> >>> > configuration
>> >>> >
>> >>> > Thanks,
>> >>> > - inder
>> >>> >
>> >>> >
>> >>> > On Fri, Jun 29, 2012 at 3:53 PM, shekhar sharma <
>> shekhar2581@gmail.com>
>> >>> > wrote:
>> >>> >>
>> >>> >> Hi Mohammad,
>> >>> >> Thanks for quick reply.
>> >>> >> I think its possible.
>> >>> >> What i did, in one machine i have created a properties file
which
>> >>> >> consist
>> >>> >> of only avro source properties (like host name and port and
>> channel)
>> >>> >>
>> >>> >> then in another machine i have created another properties file
>> which
>> >>> >> consist of avro sink properties( host name and port to connect)..
>> >>> >> In the log file where avro source is running, you can see that
the
>> >>> >> connection has been established between the two.
>> >>> >>
>> >>> >> Regards,
>> >>> >> Som Shekhar
>> >>> >>
>> >>> >>
>> >>> >>
>> >>> >> On Fri, Jun 29, 2012 at 3:34 PM, Mohammad Tariq <
>> dontariq@gmail.com>
>> >>> >> wrote:
>> >>> >>>
>> >>> >>> Hello Shekhar,
>> >>> >>>
>> >>> >>>          I don't think it is possible in Flume-NG. An agent
is
>> >>> >>> composed of source, channel and sink and together they
make a
>> flow.
>> >>> >>> Moreover, each agent is an atomic entity in a Flume-NG
flow. But I
>> >>> >>> need a green signal from the experts on this.
>> >>> >>>
>> >>> >>> Regards,
>> >>> >>>     Mohammad Tariq
>> >>> >>>
>> >>> >>>
>> >>> >>> On Fri, Jun 29, 2012 at 3:11 PM, shekhar sharma
>> >>> >>> <shekhar2581@gmail.com>
>> >>> >>> wrote:
>> >>> >>> > Hello ,
>> >>> >>> > I am using flume-ng(1.2.0)
>> >>> >>> >
>> >>> >>> >
>> >>> >>> > Is it possible to have source and sink running on
different
>> >>> >>> > machine?
>> >>> >>> > For example:
>> >>> >>> > In my flume agent, if i am using exec source and logger
sink,
>> is it
>> >>> >>> > possible
>> >>> >>> > to have exec source running on machine 1 and logger
sink
>> running on
>> >>> >>> > machine
>> >>> >>> > 2?
>> >>> >>> >
>> >>> >>> > Or is it always necessary to have the source and sink
tied up
>> >>> >>> > together
>> >>> >>> > on a
>> >>> >>> > single machine?
>> >>> >>> >
>> >>> >>> >
>> >>> >>> >
>> >>> >>> > Regards,
>> >>> >>> > Som  Shekhar
>> >>> >>
>> >>> >>
>> >>> >
>> >>> >
>> >>> >
>> >>> > --
>> >>> > Thanks,
>> >>> > - Inder
>> >>> >   Tech Platforms @Inmobi
>> >>> >   Linkedin - http://goo.gl/eR4Ub
>> >>
>> >>
>> >>
>> >>
>> >> --
>> >> Thanks,
>> >> - Inder
>> >>   Tech Platforms @Inmobi
>> >>   Linkedin - http://goo.gl/eR4Ub
>> >
>> >
>>
>
>

Mime
View raw message