ignite-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Roger Fischer (CW)" <rfis...@Brocade.com>
Subject RE: Cassandra Cache Store: How are loadCache() queries distributed
Date Fri, 04 Aug 2017 15:49:55 GMT
Thanks, Igor.

That enhancement will be very useful. Both faster load (parallel) and more efficiency (not
transferring all data <n> times) are highly desirable.

Roger

From: Igor Rudyak [mailto:irudyak@gmail.com]
Sent: Thursday, August 03, 2017 10:58 PM
To: user@ignite.apache.org
Subject: Re: Cassandra Cache Store: How are loadCache() queries distributed

Hi Roger,

As of now Cassandra Cache Store loadCache() implementation is pretty straightforward - it
sends all provided CQL queries from all Ignite nodes. There is no query analysis to distribute
data loading routine among cluster nodes.

There is an enhancement ticket created for this: https://issues.apache.org/jira/browse/IGNITE-3962<https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_IGNITE-2D3962&d=DwMFaQ&c=IL_XqQWOjubgfqINi2jTzg&r=1esZO0r0bYS90lcsaLA6N4AFxuNo6lzauhETGwdJQoQ&m=Um-YJWzYXVumPwMqixM6akpUk4J0hAYBgLfaLalcoio&s=I27CtlmoY0a0bYf8Iw9GSJrC0Yv0mizq3iIx9oo7TcA&e=>

Igor



On Thu, Aug 3, 2017 at 2:29 PM, Roger Fischer (CW) <rfische@brocade.com<mailto:rfische@brocade.com>>
wrote:
Hello,

could someone please explain to me how loadCache() queries are distributed to the Cassandra
instances when using the Cassandra Cache Store module.

I used Ignite logging and Cassandra server tracing (system_traces.sessions) to try to determine
how queries are distributed, but I can’t make sense of what I have observed.

I am quite sure of: An ignite server stores the objects for which it is the primary or a backup.
It ignores other objects received from Cassandra.

I first tried a load-all scenario, with one query (select * from table) passed in the loadCache()
call.

Initially, it looked like each Ignite server sends the query to one Cassandra node. That seems
reasonable.

However, I have also observed cases when each Ignite server sends the query to more than one
Cassandra node. Why?

Then I tried to call loadCache() with multiple queries. Specifically I created a query for
each Cassandra partition. Best-practice for Cassandra is to limit queries to a single partition.

One test seemed to imply that each Ignite server sends all queries, distributing them across
the available Cassandra nodes. This seems reasonable.

However, in another test one query (out of 6) got sent (really executed in Cassandra) only
once, most got sent twice, and a few three times. With 3 Ignite servers, I would have expected
each query to be sent 3 times (once from each Ignite server).

I am quite suspect of that last observation, as it would invalidate what I stated earlier
as “quite sure of”. Maybe Cassandra did not record all queries in the sessions table.

So how does Ignite handle a loadCache() request when there are <n> Ignite servers and
<m> Cassandra servers. The loadCache() call is made in an Ignite client.
a) when there is a single query provided to loadCache().
b) when there are multiple (<r>) queries provided to loadCache().
c) does it make any difference if the query includes all Cassandra partitioning key columns
in the where clause (ie. would the Cassandra Cache Store analyze the query to optimize the
distribution)?
d) does it make any difference if the query includes the Ignite affinity key (ie. would the
Cassandra Cache Store analyze the query to optimize which queries to send from where)?

Thanks…

Roger



Mime
View raw message