ignite-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrey Kornev <andrewkor...@hotmail.com>
Subject RE: Computation on NodeEntries
Date Tue, 15 Dec 2015 16:51:10 GMT

Yakov's approach works perfectly when one needs to process all (or most) of the entries in
the cache. However, in my case I may have 200 million entries in the cache and only, for example,
10 of them would be relevant for a particular run of the computation. I can't do a ScanQuery.
I must use a SQL query to quickly filter the data out and process the 10 matching the query.

Hope it makes it more clear.

Date: Tue, 15 Dec 2015 19:11:01 +0300
Subject: Re: Computation on NodeEntries
From: alexey.goncharuk@gmail.com
To: user@ignite.apache.org

I did not catch why Yakov's suggestion of data processing on per-partition basis does not
work for you? I assume that during this processing the cache is not updated concurrently because
otherwise the task does not make sense since there are no full cache snapshots in Ignite (yet).
To summarize what has been posted in this thread so far (Sergi, pls correct me if I am wrong):1)
SQL query issued from a single node is guaranteed to get consistent results on changing topology
(local should be set to false). Currently there is no 'native' way to parallelize SQL query
results processing, such as limit SQL query to one partition.
2) Scan query issued for a single partition is guaranteed to get consistent results on changing
topology (local should be set to false). Note that setting local=true and issuing a query
locally is not guaranteed to get consistent results. 
Having said that, I see no problem with the following approach:1) Create a compute task that
will create a job for each partition2) Map each partition's job to the partition's primary
node3) Execute a per-partition Scan query inside the job.
In a worst-case scenario, if a job arrives on the node which has already lost the partition
ownership, data will be fetched from a remote node.
You can open a ticket to fail per-partition Scan query if local flag is set to true and partition
has been moved - in this case 'wrong' jobs could be failed over to correct nodes. 		 	   
View raw message