cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Reid Pinchback <rpinchb...@tripadvisor.com>
Subject Re: execute is faster than execute_async?
Date Wed, 11 Dec 2019 16:07:49 GMT
Also note that you should be expecting async operations to be slower on a call-by-call basis.
 Async protocols have added overhead.  The point of them really is to leave the client free
to interleave other computing activity between the async calls.  It’s not usually a better
way to do batch writing. That’s not an observation specific to C*, that’s just about understanding
the role of async operations in computing.

There is some subtlety with distributed services like C* where you’re round-robining the
calls around the cluster, where repeated async calls can win relative to sync because you
aren’t waiting to hand off the next unit of work to a different node, but once the activity
starts to queue up on any kind of resource, even just TCP buffering, you’ll likely be back
to a situation where all you are measuring is the net difference in protocol overhead for
async vs sync.

One of the challenges with performance testing is you have to be pretty clear on what exactly
it is you are exercising, or all you can conclude from different numbers is that different
numbers can exist.

R

From: Alexander Dejanovski <alex@thelastpickle.com>
Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Date: Wednesday, December 11, 2019 at 7:44 AM
To: user <user@cassandra.apache.org>
Subject: Re: execute is faster than execute_async?

Message from External Sender
Hi,

you can check this piece of documentation from Datastax: https://docs.datastax.com/en/developer/python-driver/3.20/api/cassandra/cluster/#cassandra.cluster.Session.execute_async<https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.datastax.com_en_developer_python-2Ddriver_3.20_api_cassandra_cluster_-23cassandra.cluster.Session.execute-5Fasync&d=DwMFaQ&c=9Hv6XPedRSA-5PSECC38X80c1h60_XWA4z1k_R1pROA&r=OIgB3poYhzp3_A7WgD7iBCnsJaYmspOa2okNpf6uqWc&m=0ofF4UffMCqC7QUlll_Df3DXg8p2S1e6Us9n2WPDi40&s=DThiGTbvbXSgd9EgVDS5TB3UMg2BPHvC8QypKU18IY0&e=>

The usual way of doing this is to send a bunch of execute_async() calls, adding the returned
futures in a list. Once the list reaches the chosen threshold (usually we send around 100
queries and wait for them to finish before moving on the the next ones), loop through the
futures and call the result() method to block until it completes.
Should look like this:


futures = []

for i in range(len(queries)):

    futures.append(session.execute_async(queries[i]))

    if len(futures) >= 100 or i == len(queries)-1:

        for future in futures:

            results = future.result() # will block until the query finishes

        futures = []  # empty the list



Haven't tested the code above but it should give you an idea on how this can be implemented.
Sending hundreds/thousands of queries without waiting for a result will DDoS the cluster,
so you should always implement some throttling.

Cheers,

-----------------
Alexander Dejanovski
France
@alexanderdeja

Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com<https://urldefense.proofpoint.com/v2/url?u=http-3A__www.thelastpickle.com_&d=DwMFaQ&c=9Hv6XPedRSA-5PSECC38X80c1h60_XWA4z1k_R1pROA&r=OIgB3poYhzp3_A7WgD7iBCnsJaYmspOa2okNpf6uqWc&m=0ofF4UffMCqC7QUlll_Df3DXg8p2S1e6Us9n2WPDi40&s=j15rYxPPTuCan-fJfvsS7dVrfBFtz9ZKXT-4fb2Avbs&e=>


On Wed, Dec 11, 2019 at 10:42 AM Jordan West <jordanrw@gmail.com<mailto:jordanrw@gmail.com>>
wrote:
I’m not very familiar with the python client unfortunately. If it helps: In Java, async
would return futures and at the end of submitting each batch you would block on them by calling
get.

Jordan

On Wed, Dec 11, 2019 at 1:37 AM lampahome <pahome.chen@mirlab.org<mailto:pahome.chen@mirlab.org>>
wrote:


Jordan West <jordanrw@gmail.com<mailto:jordanrw@gmail.com>> 於 2019年12月11日
週三 下午4:34寫道:
Hi,

Have you tried batching calls to execute_async with periodic blocking for the batch’s responses?

Can you give me some keywords about calling execute_async batch?

PS: I use python version.
Mime
View raw message