incubator-s4-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matthieu Morel <mmo...@apache.org>
Subject Re: Adapters and S4 cluster
Date Mon, 11 Jun 2012 16:23:13 GMT
On Mon, Jun 11, 2012 at 9:11 AM, Davide Simoncelli <
Davide.Simoncelli@neclab.eu> wrote:

> On Friday, June 08, 2012 05:47:26 PM Matthieu Morel wrote:
> > On 6/8/12 5:41 PM, Shailendra Mishra wrote:
> > > It's dispatched to all S4 nodes. - Shailendra
> >
> > Right, within an S4 app, and by default. You may override that by
> > providing your own Dispatcher implementation (not yet available in
> piper).
> >
> > And when the event comes from the adapter, it's also overridable and
> > round-robin by default (right now it's broadcast in piper, but I'm
> > adding something so that round robin is the default).
> >
>
> But can the client send events with a key?
> It is important for me to send the same event to the same PN and PE.
>
> I configured the entry point PEs to receive incoming events by key. They
> are received by the PEContainer (I saw that from log files) but they aren't
> by the Processing Element (there is a debug call in processEvent method
> that should output received events, but nothing is shown).
>

That looks like a key mismatch, you should check key/stream/events
specifications are consistent. That could explain the event being received
but not redispatched to any PE.


I assume you are using S4 0.3, have you tried the following:
http://docs.s4.io/manual/client_adapter.html#sending-events-into-s4-cluster?
Otherwise, you may want to follow the design pattern where keyless events
are received as input, and where keyless PEs are used to partition the
input stream: http://docs.s4.io/manual/overview.html#special-types-of-pes

Note that to achieve that with S4 piper it's quite clear: you would use a
KeyFinder in the adapter in order to identify keys and partition events
when sending them to the S4 app.

Regards,

Matthieu





>
> Regards
>
> - Davide
> > Matthieu
> >
> > > On Fri, Jun 8, 2012 at 8:20 AM, Davide Simoncelli
> > >
> > > <Davide.Simoncelli@neclab.eu>  wrote:
> > >> Last question: is a keyless event dispatched to all S4 nodes or in
> > >> round-robin fashion?
> > >>
> > >> On Friday, June 08, 2012 03:45:29 PM Matthieu Morel wrote:
> > >>> On 6/8/12 2:26 PM, Davide Simoncelli wrote:
> > >>>> Thank you a lot for your clarifications!
> > >>>>
> > >>>> On Friday, June 08, 2012 11:37:35 AM Matthieu Morel wrote:
> > >>>>> On 6/8/12 8:47 AM, Davide Simoncelli wrote:
> > >>>>>> Hello Matthieu,
> > >>>>>>
> > >>>>>> unfortunately it didn't help me. Let me do an example.
> > >>>>>>
> > >>>>>> Suppose we have 3 nodes in the adapter cluster and 4 nodes
in the
> S4
> > >>>>>> cluster (2 nodes belongs to partition 0 and others ones
to
> partition
> > >>>>>> 1).
> > >>>>>> There are 3 PEs: - FirstPE: it receives keyless event (it
is the
> > >>>>>> entry
> > >>>>>> point)
> > >>>>>> - SecondPE: it receives events from FirstPE and sends new
events
> to
> > >>>>>> ThirdPE
> > >>>>>> - ThirdPE: it receives events from SecondPE and outputs
something
> > >>>>>>
> > >>>>>> The client injects events with the Driver which uses a
TCP/IP
> > >>>>>> connection
> > >>>>>> to talk with client IO stub of the adapter  on port 2334
(the
> default
> > >>>>>> one
> > >>>>>> is GenericJsonClientStub). As I understood injected events
> (without
> > >>>>>> key)
> > >>>>>> are dispatched to all keyless PEs for each PN. But what
about the
> two
> > >>>>>> partitions?
> > >>>>>
> > >>>>> Let me clarify partitions vs nodes. When you configure the
cluster,
> > >>>>> you
> > >>>>> define the number of partitions. When nodes are started and
> attached
> > >>>>> to
> > >>>>> the cluster, they are assigned a partition. There is only 1
> partition
> > >>>>> per node. (I don't see how you could get "2 nodes belongs to
> partition
> > >>>>> 0
> > >>>>> and others ones to partition 1").
> > >>>>
> > >>>> I thought a partition was a kind of PN container. What is the
> meaning
> > >>>> of
> > >>>> having a partition with just one PN?
> > >>>
> > >>> Multiple PNs per partition would mean replicating processing, since
> the
> > >>> PEs in those PNs would be receiving the exact same messages. In S4
> you
> > >>> want different partitions to receive different keyed events (when
> they
> > >>> are keyed).
> > >>>
> > >>>>> Since the deployment is symmetrical, all PEs are deployed on
all
> > >>>>> nodes:
> > >>>>> there are instances of FirstPE in all nodes. Then (from
> > >>>>> http://docs.s4.io/manual/client_adapter.html) , [clients] send
> events
> > >>>>> to
> > >>>>> the S4 cluster. These events may either be keyed or keyless.
In the
> > >>>>> latter case, the corresponding events are dispatched round-robin.
> > >>>>
> > >>>> So if the PE is keyless just one instance per PN exists. Is it
> right?
> > >>>
> > >>> Yes
> > >>>
> > >>>>>> An adapter sends events to S4 clusters. How do events are
> dispatched
> > >>>>>> from
> > >>>>>> the client to adapters in the adapter cluster?
> > >>>>>
> > >>>>> You use a Driver, as in
> > >>>>>
> https://github.com/s4/twittertopiccount/blob/master/src/main/java/org/
> > >>>>> apa
> > >>>>> che /s4/example/twittertopiccount/TwitterFeedListener.java
> > >>>>
> > >>>> Yea, I use the same driver. But if there are more than one adapter,
> > >>>> should
> > >>>> the client know its address and port to connect to?
> > >>>
> > >>> That's also my understanding. There are various possible ways to do
> that
> > >>> automatically.
> > >>>
> > >>>
> > >>> Matthieu
> > >>>
> > >>>> Regards
> > >>>>
> > >>>> - Davide
> > >>>>
> > >>>>>> When the FirstPE sends a new event, the dispatcher first
chooses
> the
> > >>>>>> partition and then the PN (in both cases an hash function
is used
> to
> > >>>>>> know
> > >>>>>> the target). Is it right?
> > >>>>>
> > >>>>> Almost! The dispatcher sends to the correct partition that
it gets
> > >>>>> from
> > >>>>> the partitioning scheme. Dispatch to the correct PE is done
in the
> > >>>>> receiver node.
> > >>>>>
> > >>>>> Regards,
> > >>>>>
> > >>>>> Matthieu
> > >>>>>
> > >>>>>> Thank you for your time
> > >>>>>>
> > >>>>>> - Davide
> > >>>>>> ________________________________________
> > >>>>>> From: Matthieu Morel [mm@s4.io] on behalf of Matthieu Morel
> > >>>>>> [mmorel@apache.org] Sent: Thursday, June 07, 2012 10:37
AM
> > >>>>>> To: s4-user@incubator.apache.org
> > >>>>>> Subject: Re: Adapters and S4 cluster
> > >>>>>>
> > >>>>>> Would this page answer your questions?
> > >>>>>> http://docs.s4.io/manual/client_adapter.html
> > >>>>>>
> > >>>>>> Regards,
> > >>>>>>
> > >>>>>> Matthieu
> > >>>>>>
> > >>>>>> On 6/6/12 6:13 PM, Davide Simoncelli wrote:
> > >>>>>>> Hello,
> > >>>>>>>
> > >>>>>>> I'm implementing an application with S4 and I would
like to know
> how
> > >>>>>>> events are dispatched between adapters and nodes in
the cluster
> when
> > >>>>>>> the
> > >>>>>>> client generates streams.
> > >>>>>>>
> > >>>>>>> Thank you
>

Mime
View raw message