commons-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "R.C. Hoekstra" <>
Subject [scxml] Re: Re: Re: our project setup: any tips specifically on performance/speed?
Date Fri, 30 May 2014 21:41:42 GMT
Hi Ate, hi list, 

I can share some more information now, and post some code. 

First of all, on our project: 

No, we don't use threads at the moment. Multithreading seems to become a nightmare with over
100 k engine instances. 

We also don't use the <datamodel> tags at the moment. As datamodel is still in the planning
according to the roadmap, I found it a bit dangerous to rely on it. Also, it does seem a bit
inconvenient for our purpose, but if that's not the case I'm happy to become convinced of
the contrary.

(De)Serialization is not an issue at the moment, though it might become an issue in the future.
We're considering two strategies: Maybe just log every transition and do analysis on that
log with some tool. Or dump the whole population of 100,000 agents / scxmlExecutor instances
every now and then at fixed times in some database structure for further analysis. We are
still open to suggestions, what this concerns, though that is of course not really an scxml
For the rest: the simulation is just running in memory, and for now that doesn't seem to be
a problem for the hardware. 

So here are some key parts of our code / setup (Ate, I can post them without problems now).

Main question is: which are the elements which are possibly a bottleneck for the performance,
and are there any alternatives for these elements? 

So the disease is modelled by an human.scxml file which defines the states of the disease.

 	<state id="asymptomatic">
                            agent.infectivity = 0.2
			<ntd:schedule event="from.asymptomatic" distr="EXPONENTIAL" mean="72 d" chances="0.01"/>
		<transition event="from.asymptomatic" cond=" == 0" target="symptomatic"/>
		<transition event="from.asymptomatic" cond=" == 1" target="fullyRecovered"/>

The ntd:schedule tag is responsible for scheduling the transitions in our EventManager. The
Action class responsible for this uses the attributes to draw a random number and schedules
the passed event with the resulting time in our own eventManager. When that scheduled time
has come, the eventManager sends the event back to the state machine instance in order to
invoke the transition.
The chances attribute defines the chances for each transition defined: it is a space separated
list of doubles indicating transition chances. In this case the first transition should get
a chance of 0.01 of happening, and as there is no other number given, the final remaining
transition gets a chance of 0.99 to happen. The index number of the transition is passed back
via the payload and tested in the cond attribute of the transition. 

The script tag is responsible for setting agent properties - in this case the infectivity.
Each agent has an own SCXMLExecutor instance, and sets itself to the executor's rootcontext,
so it can be accessed in the scxml files. Each agent is also its own scxml Listener.
All is single threaded. I think the EventManager is smart enough to handle and store 100,000
or more scheduled events. The EventManager is passed to our own implementation of Evaluator,
which is simply a child class of JexlEvaluator, but with awareness of math (statistic distributions
& random number generator), eventManager and the time running in the simulation. 

Then, we also need to model treatments. So we have one main scxml file with the following

	<parallel id="alive">
                        <!-- this schedules natural death... --> 
			<ntd:schedule event="die.natural" distr="EXPONENTIAL" mean="14600.0 d" />
		<transition event="die.natural" target="death" />
		<state id="biology" src="VL_human.scxml" />
		<state id="treatments" src="VL_treatments.scxml" />

	<final id="death"/>

So biology definitions define the disease itself, and all its states; treatments define how
patients are treated: subjects will pass states "untreated", "tested" and various possible
treatments. As treatments can happen in various states of the disease, I though it best to
use parallel states here. A treatment raises events which in "biology" will cause the patient
to recover. 

I hope this is enough to give a rough picture, and I hope you guys could point out some performance
bottlenecks. I understand however that most performance bottlenecks are expected to be in
the IO/(de)serialization part. 
I could post some more code snippets if that is helpful or interesting.

thanks, best regards, Rinke

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message