incubator-s4-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jagmohan Chauhan <>
Subject Performance issue in Yahoo S4
Date Wed, 21 Mar 2012 03:38:52 GMT

We are working on Yahoo S4 for a project. We are using  a simple
application where we are reading words from a file , making sentences out
of it and printing the sentences on the console. We have made two PE's for
it. The first PE extracts the words thrown by the client adapter, looks for
the . , which means end of a sentence, forms a sentence and sends it to
next PE. The second PE takes the sentence and prints it on console.   The
file size from which our client application is reading and feeding input to
the adapter is 1 GB.  The first PE's is keyless while for second one we
performed experiments with same key as well as different keys.

We are finding an unusual issue when we are trying with different
configuration of nodes.  We are trying to run the application on a cluster
which has 4 systems.
We are using 1 system for client adapter and other three as Processing
nodes.  The issue we are observing is that with increasing number of nodes
the execution time is increasing for same data set(file).

Here are some statistics :

1 node configuration: Time is 2 min 10 sec
2 node configuration : Time is 2 min 30 sec
3 node configuration :Time is 2min 40 sec

We could not  reason about this issue as we thought that with increasing
nodes we shall get better execution time . Can anyone please shed some
light on this issue. Is the overhead of disseminating events is so high
that it does not improve the execution time.

Thanks and Regards
Jagmohan Chauhan
MSc student,CS
Univ. of Saskatchewan
IEEE Graduate Student Member

View raw message