ignite-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From diopek <deha.pe...@gmail.com>
Subject Re: Data Loading Performance Issue
Date Wed, 18 Nov 2015 06:39:41 GMT
Hi Val,
Please see my comments below;

*Val:*Can you please clarify what you mean by "batch run time"? Is it
somehow connected to data loading via the store or it's a different issue?
Batch run time : first step is cache initialization (most times taking step)
and second step doing some computations and generating outputs using this
cache populated in first step. Cache loading via store. Currently most
important issue is, overall batch is running faster in my local PC (Windows
7, 8CPU/32GB RAM machine) than more powerful Linux server (64 CPU and 1TB
RAM, which also has a faster network to Oracle DB). As a side note 
deployment package for all servers was built on my Windows server with 64
Windows version of JDK 1.8.0_65, Linux boxes Linux version of that JDK
1.8.0_65 ). I am literally puzzled here, what is causing this delay..
I noticed that you put some lists instead of individual entries into the
cache. What is the size of these lists? My suspicion is that the most time
is spent for the serialization of the values (JCache spec has pass-by-value
semantics, so we have to do this even in LOCAL cache). 

Yes my cache structure IgniteCache<Integer, ArrayList&lt;MyBusinessObj>>, as
I needed to process records that has certain common attributes together,
these are like trade positions that has certain common attributes like date,
acct, currency etc. After reading into a cache during the proecessing stage
each group split into more granular records like 10 records become 1000
records and then I aggregate (group by) them so number of records shrinks to
500. then I directly write these records into some feed file. During the
interim processing/computation the don't get stored back to Ignite cache
just stays in regular Java memory till got flush into file.
Also all the grouped records in the cache are being read by multiple
partition threads, lets say if cache has 5,000,000 records and there are 5
partition threads each thread reads only its partition of records.
if I read all the records into cache row by row, how can I partition and
process records as groups. Of course I can partition rows but I loose the
grouping. This is also the reason why I  grouped and inserted into cache at
the first place (as DB doesn't have natural partition key). In my uses cases
I store sometimes customized objects as well as ArrayList, LindkedHashMap as
values. I am aware of that serialization costs but was not sure about its
Currently my first priority just to resolve this Linux deployment slowness
first, though this serialization has some cost but I am able to group and
partition the records (I am open to suggestions as how can group/partition
the records. 
I would suggest to store a value per DB row to avoid duplicate data and
therefore duplicate serializations. It also looks like loading the data in
multithreaded fashion may be helpful - execute the query first and then do
DB row parsing and saving into cache in several parallel threads. You can
utilize CacheLoadOnlyStoreAdapter for this.
Is there any working example you can point me to ?

View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Data-Loading-Performance-Issue-tp1958p1999.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

View raw message