incubator-s4-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jagmohan Chauhan <simplefundumn...@gmail.com>
Subject Re: Performance issue in Yahoo S4
Date Thu, 22 Mar 2012 06:04:31 GMT
Hi

We checked it today and we saw that the different nodes are
getting different events . We checked by printing the words and they were
different. So, the client adapter is not sending same data to each node.

On Wed, Mar 21, 2012 at 9:38 AM, Matthieu Morel <mmorel@apache.org> wrote:

> Hi,
>
> I wonder whether you are sending the same data from the adapter to each of
> the nodes. Can you check that? (You could compare the final word counts,
> between settings with 1 or more nodes).
>
> Regards,
>
> Matthieu
>
>
> On 3/21/12 4:38 AM, Jagmohan Chauhan wrote:
>
>>  Hi
>>
>> We are working on Yahoo S4 for a project. We are using  a simple
>> application where we are reading words from a file , making sentences
>> out of it and printing the sentences on the console. We have made two
>> PE's for it. The first PE extracts the words thrown by the client
>> adapter, looks for the . , which means end of a sentence, forms a
>> sentence and sends it to next PE. The second PE takes the sentence and
>> prints it on console.   The file size from which our client application
>> is reading and feeding input to the adapter is 1 GB.  The first PE's is
>> keyless while for second one we performed experiments with same key as
>> well as different keys.
>>
>> We are finding an unusual issue when we are trying with different
>> configuration of nodes.  We are trying to run the application on a
>> cluster which has 4 systems.
>> We are using 1 system for client adapter and other three as Processing
>> nodes.  The issue we are observing is that with increasing number of
>> nodes the execution time is increasing for same data set(file).
>>
>> Here are some statistics :
>>
>> 1 node configuration: Time is 2 min 10 sec
>> 2 node configuration : Time is 2 min 30 sec
>> 3 node configuration :Time is 2min 40 sec
>>
>>
>> We could not  reason about this issue as we thought that with increasing
>> nodes we shall get better execution time . Can anyone please shed some
>> light on this issue. Is the overhead of disseminating events is so high
>> that it does not improve the execution time.
>>
>> --
>> Thanks and Regards
>> Jagmohan Chauhan
>> MSc student,CS
>> Univ. of Saskatchewan
>> IEEE Graduate Student Member
>>
>> http://homepage.usask.ca/~**jac735/ <http://homepage.usask.ca/~jac735/> <
>> http://homepage.usask.ca/%**7Ejac735/<http://homepage.usask.ca/%7Ejac735/>
>> >
>>
>>
>


-- 
Thanks and Regards
Jagmohan Chauhan
MSc student,CS
Univ. of Saskatchewan
IEEE Graduate Student Member

http://homepage.usask.ca/~jac735/

Mime
View raw message