ignite-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pranas Baliuka <pra...@orangecap.net>
Subject Re: Market data binary messages processed with Ignite and Spark
Date Sat, 15 Oct 2016 21:45:34 GMT
Thanks for your input.

> Depends how you do the lookup. Is it by ID? Then keep the ids as small as possible. Lookup
is fastest in a hash map type of datastructure. In case of a distributed setting supported
by a bloom filter. 
> Apache Ignite can be seen as suitable. 

My take from it: use int instead of long for a key.

> I do not think the caching of the filesystem benefits here, because the key is the datastructure
here (hash map). 
My concern what cache can be polluted easy … at the same time with modern OS prefetching
techniques for sequential access it may be suitable. 
Reading from SSD uncompressed data from single consumer (I’d have multiple) is capable to
feed processing pipeline.

> Maybe you can tell a little bit more about the data. Are the messages dependent? What
type of calculation do you do? 
Data would be binary payload e.g. 20 levels of bid/ask of the order book for each change on
the primary market for specific security. I’d have > 3K securities i.e 3K cashes to read
from.
The purpose is to back-test trading algorithm (with feedback of algorithm to the market liquidity)
and find stable regions of parameters delivering good performance of the fitness function
(diff from selected benchmark).
The algorithm would be iterative to select focus areas of parameters grid for ability to zoom-in
and see possible risky choices in the “good” areas of parameters space.

Each point in the grid (e.g. 2 parameters optimised) is result of fitness function from simulation
of single security for 1 day market data. 
1st iteration calculate evenly distributed points in parameters grid. For each point perform
simulation (trading for 1 day)
nth iteration select best performing points and estimate density with parametric statistics.
Use the density function results to calculate new grid.
Stop once density estimates stable and/or computational resources exhausted.

The resulting grid wold be visualised and provided for analyst to reevaluate production settings.
e.g. see what parameters are stable for particular market conditions.

> On 15 Oct. 2016, at 10:32 pm, Jörn Franke [via Apache Ignite Users] <ml-node+s70518n8315h72@n6.nabble.com>
wrote:
> 
> Depends how you do the lookup. Is it by ID? Then keep the ids as small as possible. Lookup
is fastest in a hash map type of datastructure. In case of a distributed setting supported
by a bloom filter. 
> Apache Ignite can be seen as suitable. 
> 
> Depending on what you need to do (maybe your approach requires hyperlolog structured
etc) you may look also at redis, but from what you describe Ignite is suitable. 
> 
> I do not think the caching of the filesystem benefits here, because the key is the datastructure
here (hash map). 
> The concrete physical infrastructure to meet your SLAs can only be determined when you
experiment with real data. 
> 
> Maybe you can tell a little bit more about the data. Are the messages dependent? What
type of calculation do you do? 
> 
> > On 15 Oct 2016, at 07:23, Pranas Baliuka <[hidden email] <x-msg://5/user/SendEmail.jtp?type=node&node=8315&i=0>>
wrote: 
> > 
> > Dear Ignite enthusiasts, 
> > 
> > I am beginner in Apache Ingnite, but want to prototype solution for using 
> > Ignite cashes with market data distributed across multiple nodes running 
> > Spark RDD. 
> > 
> > I'd like to be able to send sequenced (from 1) binary messages (size from 40 
> > bytes to max 1 Kb) to custom Spark job processing multidimensional cube of 
> > parameters. 
> > Each market data event must be processed once from #1 to #records for each 
> > parameter. 
> > Number of messages ~40-50 M in one batch. 
> > 
> > It would be great if you can share your experience with similar imp. 
> > 
> > My high level thinking: 
> > * Prepare system by loading Ignite Cashe (unzipping market data drop-copy 
> > file, converting to preferred binary format and publish IgniteCache<Long, 
> > BinaryObject>; 
> > * Spawn Spark job to process input cube of parameters (SparkRDD) each using 
> > cashed the same IgniteCashe (accessed sequentially by sequence number from 1 
> > - #messages as key); 
> > * Store results in RDMS/NoSQL storage; 
> > * Perform reports from Apache Zeppelin using Spark.R interpreter. 
> > 
> > I need for Cache outlive Spark jobs i.e. may run different cube of 
> > parameters after one is finished. 
> > 
> > I am not sure if Ignite would be able to lookup messages efficiently (I'd 
> > need ~400 Km/s sustained retrieval). 
> > Or should I consider something more file oriented e.g. use memory mounted 
> > file system on each node ... 
> > 
> > Thank in advance to share your ideas/proposals/know-how! 
> > 
> > 
> > 
> > 
> > -- 
> > View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Market-data-binary-messages-processed-with-Ignite-and-Spark-tp8313.html
<http://apache-ignite-users.70518.x6.nabble.com/Market-data-binary-messages-processed-with-Ignite-and-Spark-tp8313.html>
> > Sent from the Apache Ignite Users mailing list archive at Nabble.com. 
> 
> 
> If you reply to this email, your message will be added to the discussion below:
> http://apache-ignite-users.70518.x6.nabble.com/Market-data-binary-messages-processed-with-Ignite-and-Spark-tp8313p8315.html
<http://apache-ignite-users.70518.x6.nabble.com/Market-data-binary-messages-processed-with-Ignite-and-Spark-tp8313p8315.html>
> To unsubscribe from Market data binary messages processed with Ignite and Spark, click
here <http://apache-ignite-users.70518.x6.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=8313&code=cHJhbmFzQG9yYW5nZWNhcC5uZXR8ODMxM3wtMTU2MjQ5NTIyNA==>.
> NAML <http://apache-ignite-users.70518.x6.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>




--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Market-data-binary-messages-processed-with-Ignite-and-Spark-tp8313p8316.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.
Mime
View raw message