incubator-s4-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sky Zhao <sky.z...@ericsson.com>
Subject RE: About 200,000 events/s
Date Tue, 25 Jun 2013 12:15:18 GMT

I use 4g memory to handle the events, so I feel s4 eat more memory and cpus and server numbers.
How many servers and cpu and memory to handle 200,000 events?


/Sky


From: Sky Zhao [mailto:sky.zhao@ericsson.com]
Sent: Tuesday, June 25, 2013 7:08 PM
To: 's4-user@incubator.apache.org'
Subject: RE: About 200,000 events/s

I list my code here, in my app, I created 3 PEs(the logic is very simple, just emit stream)


               @Override
               protected void onInit() {

                              CsvReporter.enable(new File(mpath), 20, TimeUnit.SECONDS);

                              // create a prototype
        EntryPE entryPE = createPE(EntryPE.class);


        createInputStream("seaRawStream", new KeyFinder<Event>() {

            @Override
            public List<String> get(Event event) {
               return Arrays.asList(new String[] { event.get("seadata") });
            }
        }, entryPE);


        ResultPE resultPE = createPE(ResultPE.class);
               Stream<DataEvent> processStream = createStream("Process Stream", new
GenKeyFinder(), resultPE);
               processStream.setParallelism(Integer.parseInt(thread));


        ProcessPE processPE = createPE(ProcessPE.class);
        processPE.setDataStream(processStream);



        Stream<DataEvent> entryStream = createStream("Entry Stream", new GenKeyFinder(),
processPE);
        entryStream.setParallelism(Integer.parseInt(thread));

               entryPE.setStreams(entryStream);
               }



The event data is
String kpi_name;
               String mo_name;
               double kpi_value;
               long timestamp;

               public DataEvent() {

               }

               public DataEvent(String kpi_name, String mo_name, double kpi_value,
                                             long timestamp) {
                              this.kpi_name = kpi_name;
                              this.mo_name = mo_name;
                              this.kpi_value = kpi_value;
                              this.timestamp = timestamp;

               }

....
Get/set methods


Keyfinder:

public class GenKeyFinder implements KeyFinder<DataEvent> {


    public List<String> get(DataEvent event) {

        List<String> results = new ArrayList<String>();

        /* Retrieve the kpi_name,mo_name and add them to the list. */
        results.add(event.getKpi_name()+":"+event.getMo_name()+":"+String.valueOf(event.getTimestamp()));

        return results;
    }
}



=====adapter part code


                                                                           // ////////sending
to s4
                                                                           DataEvent event
= new DataEvent(kpi_name, mo_name,
                                                                                         
               kpi_value, timestamp);
                                                                           event.put("seadata",
String.class, kpi_name+":"+"mo_name"+":"+timestamp);
                                                                           rstream.put(event);


whether the keyfinder or event key impact the event sending/receiving?
From: Matthieu Morel [mailto:mmorel@apache.org]
Sent: Tuesday, June 25, 2013 5:03 PM
To: s4-user@incubator.apache.org<mailto:s4-user@incubator.apache.org>
Subject: Re: About 200,000 events/s

Not sure what is the issue in your setting. Performance issues in stream processing can be
related to I/O or GC. But the app design can have a dramatic impact as well. Have a look at
your CPU usage as well.

You might want to profile the app and adapter to identify the culprit in your application.

Another path to explore is related to the content of events. Serialization/deserialization
may be costly for complex objects. What kind of data structure are you keeping in the events?


On Jun 25, 2013, at 09:56 , Sky Zhao <sky.zhao@ericsson.com<mailto:sky.zhao@ericsson.com>>
wrote:

Also, can you teach me how to find or trace the block place, I am still confused why and where
is block?

What are you referring to?


How many nodes in twitter example for up to 200,000 events/s?

You'll never get to that number with that application, since the twitter sprinkler feed is
~ 1% of the total feed, and the reported peak rate of the total feed is a few tens of thousands
of tweets / s

But if you were to read from a dump, I'd say a few nodes for the adapter and a few nodes for
the app.


Matthieu



From: Sky Zhao [mailto:sky.zhao@ericsson.com<http://ericsson.com>]
Sent: Tuesday, June 25, 2013 10:25 AM
To: 's4-user@incubator.apache.org<mailto:s4-user@incubator.apache.org>'
Subject: RE: About 200,000 events/s

Hi Matthieu, I tried to test again after modifying some configuration codes, see below, no
any PE logic, just send events(only spend 38s for adapter sending events) don't know where
is blocked?



From: Matthieu Morel [mailto:mmorel@apache.org]
Sent: Tuesday, June 25, 2013 12:25 AM
To: s4-user@incubator.apache.org<mailto:s4-user@incubator.apache.org>
Subject: Re: About 200,000 events/s

Hi,

I would suggest to:

1/ check how much you can generate when creating events read from the file - without event
sending to a remote stream. This gives you the upper bound for a single adapter (producer)

It costs 38s for only file-read from adapters


2/ check how much you can consume in the app cluster. By default the remote senders are blocking,
i.e. the adapter won't inject more than what the app cluster can consume. This gives you an
upper bound for the consumer.
I removed all PE logic, just emit functions, very strange, it still cost 600s, seems somewhere
blocking


3/ use more adapter processes. In the benchmarks subprojects, one can configure the number
of injection processes, and you might need more than one
I tried, seems improve a bit, but not obivouse, 2500 events/s maximum.

4/ make sure the tuning parameters you are setting are appropriate. For instance, I am not
sure using 100 threads for serializing events is a good setting (see my notes about context
switching in a previous mail).
Already changed to 10 threads

Also note that 200k msg/s/stream/node corresponds to the average rate in one minute _once
the cluster has reached stable mode_. Indeed JVMs typically perform better after a while,
due to various kinds of dynamic optimizations. Do make sure you experiments are long enough.

Here is metrics report, already run 620s(NO pe logic,)
List event-emitted@seacluster1@partition-0.csv<mailto:event-emitted@seacluster1@partition-0.csv>
file

# time,count,1 min rate,mean rate,5 min rate,15 min rate
20,30482,728.4264567247864,1532.1851387286329,577.7027401449616,550.7086872323928
40,68256,1003.4693068790475,1710.158498360041,651.1838653093574,576.4097533046603
60,126665,1526.423951717675,2114.1420782852447,792.5372602881871,626.2019978354255
80,159222,1631.4770294401224,1992.409242530607,865.8704071266087,654.9705320161032
100,206876,1821.551542703833,2070.5009200329328,958.5809893144597,691.1999843184435
120,245115,1852.9758861695257,2044.0217493195803,1021.5173199955651,718.5310141264627
140,286057,1892.8858688327914,2044.444972093294,1084.2498192357743,746.5688788373484
160,324662,1952.4301928970413,2030.146074235012,1151.037578221836,776.8090098793148
180,371829,1959.4350067303067,2066.6134765239876,1202.907690109737,802.6284106918696
200,433557,2283.734039729985,2168.043356631868,1325.4143582100112,853.1681426416412
220,464815,2165.047817425384,2113.0019773033414,1362.4434184467839,876.2949286924105
240,504620,2065.9291371459913,2102.751240490619,1390.829435765405,896.605433400642
260,558158,2192.29985605397,2146.9020962821564,1462.6363950595012,931.9095985316526
280,595910,2199.1021924478428,2128.3677435991726,1513.937239488627,961.2098181907241
300,627181,1994.0377201896988,2090.694355349902,1511.0411290249551,972.3477079289204
320,664342,1959.8933962067783,2076.1406486970463,1534.4815810411815,992.1780521538831
340,698717,1912.6664972668054,2055.1073335596802,1551.413320562825,1009.8797759279687
360,738468,1940.3861933767728,2051.3430577192726,1579.897070517331,1031.4220907479778
380,769673,1808.904535049888,2025.4290025114635,1577.8601886321867,1045.7495364549434
400,807406,1834.34177336605,2018.480153382513,1597.9097843354657,1064.241627735365
420,851009,1918.7250831019285,2026.1694030316874,1634.835168023956,1088.691609058512
440,884427,1854.4469403637247,2010.0049048826818,1637.4121396194184,1101.510206100955
460,927835,1951.4786440268247,2016.9731382075943,1672.0801089577474,1125.0229080682857
480,975961,2076.6338294064494,2033.1853468961958,1719.2462212252974,1153.158261927912
500,1018777,2094.823218886582,2037.4852753404898,1746.4288709310954,1174.8624939692247
520,1063733,2137.998709779158,2045.5028969721004,1778.8271760046641,1198.4702839108422
540,1105407,2121.91619308999,2046.9043296729049,1798.2960826750962,1217.8576817035882
560,1148338,2126.4127152635606,2050.4509561002797,1820.616007231527,1238.2444234074262
580,1189999,2113.61602051879,2051.570290715584,1837.4446097375073,1256.7786337375683
600,1219584,1937.6445710562814,2031.6753069957329,1814.671731059335,1261.7477397513687
620,1237632,1632.4506998235333,1995.2499874637524,1755.4487350018405,1253.8445924435982



Seems very strange value for my example, far from 200,000 events/s/node/stream


Regards,

Matthieu


On Jun 24, 2013, at 11:19 , Sky Zhao <sky.zhao@ericsson.com<mailto:sky.zhao@ericsson.com>>
wrote:

I try to use Adapter to send s4 events. With metrics report,
20,10462,88.63259092217602,539.6449108859357,18.577650313690874,6.241814566462701
40,36006,417.83633322358764,914.1057643161282,97.55624823196746,33.40088245418529
60,63859,674.1012974987167,1075.2326549158463,176.33878995148274,61.646803531230724
80,97835,953.6282787690939,1232.2934375999696,271.48890371088254,96.56144395108957
100,131535,1162.2060916405578,1323.3704459079934,363.98505627735324,131.98430793014757
120,165282,1327.52314133145,1384.2675551261093,453.5195236495672,167.61679021575551
140,190776,1305.7285112621298,1368.4361242524062,504.7782182758366,191.36049732440895

20,000 events per 20s  => 1000 EVENTS/s

Very slow, I modify the S4_HOME/subprojects/s4-comm/bin/default.s4.comm.properties

s4.comm.emitter.class=org.apache.s4.comm.tcp.TCPEmitter
s4.comm.emitter.remote.class=org.apache.s4.comm.tcp.TCPRemoteEmitter
s4.comm.listener.class=org.apache.s4.comm.tcp.TCPListener

# I/O channel connection timeout, when applicable (e.g. used by netty)
s4.comm.timeout=1000

# NOTE: the following numbers should be tuned according to the application, use case, and
infrastructure

# how many threads to use for the sender stage (i.e. serialization)
#s4.sender.parallelism=1
s4.sender.parallelism=100
# maximum number of events in the buffer of the sender stage
#s4.sender.workQueueSize=10000
s4.sender.workQueueSize=100000
# maximum sending rate from a given node, in events / s (used with throttling sender executors)
s4.sender.maxRate=200000

# how many threads to use for the *remote* sender stage (i.e. serialization)
#s4.remoteSender.parallelism=1
s4.remoteSender.parallelism=100
# maximum number of events in the buffer of the *remote* sender stage
#s4.remoteSender.workQueueSize=10000
s4.remoteSender.workQueueSize=100000
# maximum *remote* sending rate from a given node, in events / s (used with throttling *remote*
sender executors)
s4.remoteSender.maxRate=200000

# maximum number of pending writes to a given comm channel
#s4.emitter.maxPendingWrites=1000
s4.emitter.maxPendingWrites=10000

# maximum number of events in the buffer of the processing stage
#s4.stream.workQueueSize=10000
s4.stream.workQueueSize=100000

only improve from 500 events 1000 events,

I read file 88m only need 8s, but send events cost 620s now for 1,237,632 events, why slow,
s4 can trigger 200,000 events/s, how I can do up to this values, pls give me detail instructions.





Mime
View raw message