incubator-s4-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Davide Simoncelli <Davide.Simonce...@neclab.eu>
Subject RE: Adapters and S4 cluster
Date Mon, 11 Jun 2012 07:11:34 GMT
On Friday, June 08, 2012 05:47:26 PM Matthieu Morel wrote:
> On 6/8/12 5:41 PM, Shailendra Mishra wrote:
> > It's dispatched to all S4 nodes. - Shailendra
> 
> Right, within an S4 app, and by default. You may override that by
> providing your own Dispatcher implementation (not yet available in piper).
> 
> And when the event comes from the adapter, it's also overridable and
> round-robin by default (right now it's broadcast in piper, but I'm
> adding something so that round robin is the default).
> 

But can the client send events with a key?
It is important for me to send the same event to the same PN and PE. 

I configured the entry point PEs to receive incoming events by key. They are received by the
PEContainer (I saw that from log files) but they aren't by the Processing Element (there is
a debug call in processEvent method that should output received events, but nothing is shown).

Regards

- Davide
> Matthieu
> 
> > On Fri, Jun 8, 2012 at 8:20 AM, Davide Simoncelli
> > 
> > <Davide.Simoncelli@neclab.eu>  wrote:
> >> Last question: is a keyless event dispatched to all S4 nodes or in
> >> round-robin fashion?
> >> 
> >> On Friday, June 08, 2012 03:45:29 PM Matthieu Morel wrote:
> >>> On 6/8/12 2:26 PM, Davide Simoncelli wrote:
> >>>> Thank you a lot for your clarifications!
> >>>> 
> >>>> On Friday, June 08, 2012 11:37:35 AM Matthieu Morel wrote:
> >>>>> On 6/8/12 8:47 AM, Davide Simoncelli wrote:
> >>>>>> Hello Matthieu,
> >>>>>> 
> >>>>>> unfortunately it didn't help me. Let me do an example.
> >>>>>> 
> >>>>>> Suppose we have 3 nodes in the adapter cluster and 4 nodes in
the S4
> >>>>>> cluster (2 nodes belongs to partition 0 and others ones to partition
> >>>>>> 1).
> >>>>>> There are 3 PEs: - FirstPE: it receives keyless event (it is
the
> >>>>>> entry
> >>>>>> point)
> >>>>>> - SecondPE: it receives events from FirstPE and sends new events
to
> >>>>>> ThirdPE
> >>>>>> - ThirdPE: it receives events from SecondPE and outputs something
> >>>>>> 
> >>>>>> The client injects events with the Driver which uses a TCP/IP
> >>>>>> connection
> >>>>>> to talk with client IO stub of the adapter  on port 2334 (the
default
> >>>>>> one
> >>>>>> is GenericJsonClientStub). As I understood injected events (without
> >>>>>> key)
> >>>>>> are dispatched to all keyless PEs for each PN. But what about
the two
> >>>>>> partitions?
> >>>>> 
> >>>>> Let me clarify partitions vs nodes. When you configure the cluster,
> >>>>> you
> >>>>> define the number of partitions. When nodes are started and attached
> >>>>> to
> >>>>> the cluster, they are assigned a partition. There is only 1 partition
> >>>>> per node. (I don't see how you could get "2 nodes belongs to partition
> >>>>> 0
> >>>>> and others ones to partition 1").
> >>>> 
> >>>> I thought a partition was a kind of PN container. What is the meaning
> >>>> of
> >>>> having a partition with just one PN?
> >>> 
> >>> Multiple PNs per partition would mean replicating processing, since the
> >>> PEs in those PNs would be receiving the exact same messages. In S4 you
> >>> want different partitions to receive different keyed events (when they
> >>> are keyed).
> >>> 
> >>>>> Since the deployment is symmetrical, all PEs are deployed on all
> >>>>> nodes:
> >>>>> there are instances of FirstPE in all nodes. Then (from
> >>>>> http://docs.s4.io/manual/client_adapter.html) , [clients] send events
> >>>>> to
> >>>>> the S4 cluster. These events may either be keyed or keyless. In
the
> >>>>> latter case, the corresponding events are dispatched round-robin.
> >>>> 
> >>>> So if the PE is keyless just one instance per PN exists. Is it right?
> >>> 
> >>> Yes
> >>> 
> >>>>>> An adapter sends events to S4 clusters. How do events are dispatched
> >>>>>> from
> >>>>>> the client to adapters in the adapter cluster?
> >>>>> 
> >>>>> You use a Driver, as in
> >>>>> https://github.com/s4/twittertopiccount/blob/master/src/main/java/org/
> >>>>> apa
> >>>>> che /s4/example/twittertopiccount/TwitterFeedListener.java
> >>>> 
> >>>> Yea, I use the same driver. But if there are more than one adapter,
> >>>> should
> >>>> the client know its address and port to connect to?
> >>> 
> >>> That's also my understanding. There are various possible ways to do that
> >>> automatically.
> >>> 
> >>> 
> >>> Matthieu
> >>> 
> >>>> Regards
> >>>> 
> >>>> - Davide
> >>>> 
> >>>>>> When the FirstPE sends a new event, the dispatcher first chooses
the
> >>>>>> partition and then the PN (in both cases an hash function is
used to
> >>>>>> know
> >>>>>> the target). Is it right?
> >>>>> 
> >>>>> Almost! The dispatcher sends to the correct partition that it gets
> >>>>> from
> >>>>> the partitioning scheme. Dispatch to the correct PE is done in the
> >>>>> receiver node.
> >>>>> 
> >>>>> Regards,
> >>>>> 
> >>>>> Matthieu
> >>>>> 
> >>>>>> Thank you for your time
> >>>>>> 
> >>>>>> - Davide
> >>>>>> ________________________________________
> >>>>>> From: Matthieu Morel [mm@s4.io] on behalf of Matthieu Morel
> >>>>>> [mmorel@apache.org] Sent: Thursday, June 07, 2012 10:37 AM
> >>>>>> To: s4-user@incubator.apache.org
> >>>>>> Subject: Re: Adapters and S4 cluster
> >>>>>> 
> >>>>>> Would this page answer your questions?
> >>>>>> http://docs.s4.io/manual/client_adapter.html
> >>>>>> 
> >>>>>> Regards,
> >>>>>> 
> >>>>>> Matthieu
> >>>>>> 
> >>>>>> On 6/6/12 6:13 PM, Davide Simoncelli wrote:
> >>>>>>> Hello,
> >>>>>>> 
> >>>>>>> I'm implementing an application with S4 and I would like
to know how
> >>>>>>> events are dispatched between adapters and nodes in the
cluster when
> >>>>>>> the
> >>>>>>> client generates streams.
> >>>>>>> 
> >>>>>>> Thank you
Mime
View raw message