ignite-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From hueb1 <eric...@finra.org>
Subject Super slow data loading performance when more nodes added
Date Sun, 23 Aug 2015 19:04:22 GMT
I'm loading a 226mb file with about 7 million lines in it.
I plan to add each line as a separate cache entry.

My cache configuration is 

I've written custom code to break the input file into blocks where each
block is downloaded and processed in parallel as compute tasks.  Each
compute task creates its own DataStreamer to the same distributed cache. 
Each compute task writes lines to the cache as it reads it from its block of
the file.  

Here are the metrics
Nodes = 1
Threads = 1
Time = 35 seconds

Nodes = 1
Threads = 10
Time = 17 seconds

Nodes = 2 (on same host)
Threads = 20 (10 per node)
Time = 25 seconds

Nodes = 2 (on different hosts)
Threads = 20 (10 per node)
Time = ?  (manually killed after 5 minutes)

Adding more threads for a single node run seemed to speed things up, but
adding more nodes on the same host slowed it down.  And adding more nodes on
separate hosts made it impossible to complete anything.  Is having each
thread create their own DataStreamer to the shared cache what's causing this
"reverse" horizontal scalability behavior?

What is the recommended approach to quickly load a large file into a
distributed cache?  We have a use case to load 1gb files into main memory as
fast as possible. Any suggestions would be appreciated.

View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Super-slow-data-loading-performance-when-more-nodes-added-tp1105.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

View raw message