incubator-s4-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From kishore g <g.kish...@gmail.com>
Subject Re: Performance issue in Yahoo S4
Date Thu, 22 Mar 2012 06:35:34 GMT
Is the client adapter reading and sending the data sequentially? Then the
time taken will be roughly same in all the three cases. You will see
improvement if the client adapter is able to send data to all nodes
simultaneously, is the adapter multi threaded?

Can you tell more about the setup and how is the total time measured.

On Wed, Mar 21, 2012 at 11:04 PM, Jagmohan Chauhan <
simplefundumnnit@gmail.com> wrote:

> Hi
>
> We checked it today and we saw that the different nodes are
> getting different events . We checked by printing the words and they were
> different. So, the client adapter is not sending same data to each node.
>
>
> On Wed, Mar 21, 2012 at 9:38 AM, Matthieu Morel <mmorel@apache.org> wrote:
>
>> Hi,
>>
>> I wonder whether you are sending the same data from the adapter to each
>> of the nodes. Can you check that? (You could compare the final word counts,
>> between settings with 1 or more nodes).
>>
>> Regards,
>>
>> Matthieu
>>
>>
>> On 3/21/12 4:38 AM, Jagmohan Chauhan wrote:
>>
>>>  Hi
>>>
>>> We are working on Yahoo S4 for a project. We are using  a simple
>>> application where we are reading words from a file , making sentences
>>> out of it and printing the sentences on the console. We have made two
>>> PE's for it. The first PE extracts the words thrown by the client
>>> adapter, looks for the . , which means end of a sentence, forms a
>>> sentence and sends it to next PE. The second PE takes the sentence and
>>> prints it on console.   The file size from which our client application
>>> is reading and feeding input to the adapter is 1 GB.  The first PE's is
>>> keyless while for second one we performed experiments with same key as
>>> well as different keys.
>>>
>>> We are finding an unusual issue when we are trying with different
>>> configuration of nodes.  We are trying to run the application on a
>>> cluster which has 4 systems.
>>> We are using 1 system for client adapter and other three as Processing
>>> nodes.  The issue we are observing is that with increasing number of
>>> nodes the execution time is increasing for same data set(file).
>>>
>>> Here are some statistics :
>>>
>>> 1 node configuration: Time is 2 min 10 sec
>>> 2 node configuration : Time is 2 min 30 sec
>>> 3 node configuration :Time is 2min 40 sec
>>>
>>>
>>> We could not  reason about this issue as we thought that with increasing
>>> nodes we shall get better execution time . Can anyone please shed some
>>> light on this issue. Is the overhead of disseminating events is so high
>>> that it does not improve the execution time.
>>>
>>> --
>>> Thanks and Regards
>>> Jagmohan Chauhan
>>> MSc student,CS
>>> Univ. of Saskatchewan
>>> IEEE Graduate Student Member
>>>
>>> http://homepage.usask.ca/~**jac735/<http://homepage.usask.ca/%7Ejac735/><
>>> http://homepage.usask.ca/%**7Ejac735/<http://homepage.usask.ca/%7Ejac735/>
>>> >
>>>
>>>
>>
>
>
> --
> Thanks and Regards
> Jagmohan Chauhan
> MSc student,CS
> Univ. of Saskatchewan
> IEEE Graduate Student Member
>
> http://homepage.usask.ca/~jac735/
>
>

Mime
View raw message