incubator-s4-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jagmohan Chauhan <simplefundumn...@gmail.com>
Subject Re: Performance issue in Yahoo S4
Date Fri, 23 Mar 2012 04:38:40 GMT
Hi

One thing we observed in our experiments was that the second PE
from first PE does not receive further events unless it is done with the
one it is processing . Does it mean that events are
send synchronously between the two PE's in Yahoo S4?
Is there a way to send the events synchronously between PE's. We think
that because of synchronous nature of PE's we are
not getting the performance with increasing number of nodes.

On Thu, Mar 22, 2012 at 12:14 AM, Jagmohan Chauhan <
simplefundumnnit@gmail.com> wrote:

> Hi
>
> Kishore : Thanks  for bringing up a very important question. We actually
> thought exactly similar today.
>
> Let s  give you give the brief overview of what we understand about Yahoo
> S4 architecture in terms of client adapter and our setup.
>
> 1. Our client application is running  on one node and reads input
> sequentially from a file line by line.
> 2. The client application reads the line,makes message and dispatch it to
> the client adapter which is running on the same node.So  we use localhost
> and port number 2334.
> 3.  We think that client adapter is Yahoo S4 internal part and we do not
> have to touch it .It shall read the input message coming from our
> client application and convert it into json object to make event and then
> send it across network to different S4 nodes for further processing. We
> have three other nodes in cluster for S4 node processing.
> So, we do not change anything with adapter.
> 4. We are using NFS filesystem and all nodes are on same switch.
> 5. We are measuring total time by calculating time when our client
> application starts sending data and when it is finished.
>
> There is one other thing which is confusing to us : The port
> number in client -stub.xml is 2334 and that is where we send our messages
> form the client application. But we also see a client adapter port in
> clusters.xml. So, what is the relation between the two .
>
> We were thinking of using two client adapters today and we tried but could
> not succeed.
> If someone can clear our doubts and shed some light on how we can make
> multiple client adapters or make existing adapter multi-threaded then it
> may be helpful to investigate our issue.
>
>
> On Wed, Mar 21, 2012 at 11:35 PM, kishore g <g.kishore@gmail.com> wrote:
>
>> Is the client adapter reading and sending the data sequentially? Then the
>> time taken will be roughly same in all the three cases. You will see
>> improvement if the client adapter is able to send data to all nodes
>> simultaneously, is the adapter multi threaded?
>>
>> Can you tell more about the setup and how is the total time measured.
>>
>> On Wed, Mar 21, 2012 at 11:04 PM, Jagmohan Chauhan <
>> simplefundumnnit@gmail.com> wrote:
>>
>>> Hi
>>>
>>> We checked it today and we saw that the different nodes are
>>> getting different events . We checked by printing the words and they were
>>> different. So, the client adapter is not sending same data to each node.
>>>
>>>
>>> On Wed, Mar 21, 2012 at 9:38 AM, Matthieu Morel <mmorel@apache.org>wrote:
>>>
>>>> Hi,
>>>>
>>>> I wonder whether you are sending the same data from the adapter to each
>>>> of the nodes. Can you check that? (You could compare the final word counts,
>>>> between settings with 1 or more nodes).
>>>>
>>>> Regards,
>>>>
>>>> Matthieu
>>>>
>>>>
>>>> On 3/21/12 4:38 AM, Jagmohan Chauhan wrote:
>>>>
>>>>>  Hi
>>>>>
>>>>> We are working on Yahoo S4 for a project. We are using  a simple
>>>>> application where we are reading words from a file , making sentences
>>>>> out of it and printing the sentences on the console. We have made two
>>>>> PE's for it. The first PE extracts the words thrown by the client
>>>>> adapter, looks for the . , which means end of a sentence, forms a
>>>>> sentence and sends it to next PE. The second PE takes the sentence and
>>>>> prints it on console.   The file size from which our client application
>>>>> is reading and feeding input to the adapter is 1 GB.  The first PE's
is
>>>>> keyless while for second one we performed experiments with same key as
>>>>> well as different keys.
>>>>>
>>>>> We are finding an unusual issue when we are trying with different
>>>>> configuration of nodes.  We are trying to run the application on a
>>>>> cluster which has 4 systems.
>>>>> We are using 1 system for client adapter and other three as Processing
>>>>> nodes.  The issue we are observing is that with increasing number of
>>>>> nodes the execution time is increasing for same data set(file).
>>>>>
>>>>> Here are some statistics :
>>>>>
>>>>> 1 node configuration: Time is 2 min 10 sec
>>>>> 2 node configuration : Time is 2 min 30 sec
>>>>> 3 node configuration :Time is 2min 40 sec
>>>>>
>>>>>
>>>>> We could not  reason about this issue as we thought that with
>>>>> increasing
>>>>> nodes we shall get better execution time . Can anyone please shed some
>>>>> light on this issue. Is the overhead of disseminating events is so high
>>>>> that it does not improve the execution time.
>>>>>
>>>>> --
>>>>> Thanks and Regards
>>>>> Jagmohan Chauhan
>>>>> MSc student,CS
>>>>> Univ. of Saskatchewan
>>>>> IEEE Graduate Student Member
>>>>>
>>>>> http://homepage.usask.ca/~**jac735/<http://homepage.usask.ca/%7Ejac735/><
>>>>> http://homepage.usask.ca/%**7Ejac735/<http://homepage.usask.ca/%7Ejac735/>
>>>>> >
>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> Thanks and Regards
>>> Jagmohan Chauhan
>>> MSc student,CS
>>> Univ. of Saskatchewan
>>> IEEE Graduate Student Member
>>>
>>> http://homepage.usask.ca/~jac735/
>>>
>>>
>>
>
>
> --
> Thanks and Regards
> Jagmohan Chauhan
> MSc student,CS
> Univ. of Saskatchewan
> IEEE Graduate Student Member
>
> http://homepage.usask.ca/~jac735/
>
>


-- 
Thanks and Regards
Jagmohan Chauhan
MSc student,CS
Univ. of Saskatchewan
IEEE Graduate Student Member

http://homepage.usask.ca/~jac735/

Mime
View raw message