cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stefania (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-11053) COPY FROM on large datasets: fix progress report and debug performance
Date Tue, 16 Feb 2016 10:13:18 GMT


Stefania commented on CASSANDRA-11053:

I have experimented with CPU affinity by pinning each worker process to a core but whist this
gave slightly better results locally, on AWS it actually made it worst. 

I've also examined the {{strace}} output locally and the most frequent system calls are {{futex,
read, write and poll}}. To reduce contention I've replaced the python queue with multiple
point-to-point pipes (the queue was implemented over a single pipe with interprocess locks).
I didn't see much improvement locally but perhaps on AWS it matters more since locally I can
only run 2 worker processes or I max out the cluster that also runs locally. By removing the
Python queue I was also able to remove one thread, which in Python is a good thing due to
the GIL (Global Interpreter Lock).

I plan to test this implementation on AWS, together with an additional suggestion to increase
time slicing ({{schedtool -B}}), then if everything works as expected I will move the ticket
to patch available.

It's worth noting that the driver doesn't coalesce messages on the socket at present. This
could be detrimental on virtualized environments like AWS, especially if [enhanced networking|]
is not available. However we would probably need to worry about this once our encoding functions
are faster, at the moment the bottleneck is still encoding so I would leave this for a future

> COPY FROM on large datasets: fix progress report and debug performance
> ----------------------------------------------------------------------
>                 Key: CASSANDRA-11053
>                 URL:
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Tools
>            Reporter: Stefania
>            Assignee: Stefania
>             Fix For: 2.1.x, 2.2.x, 3.0.x, 3.x
>         Attachments: copy_from_large_benchmark.txt, copy_from_large_benchmark_2.txt,
parent_profile.txt, parent_profile_2.txt, worker_profiles.txt, worker_profiles_2.txt
> Running COPY from on a large dataset (20G divided in 20M records) revealed two issues:
> * The progress report is incorrect, it is very slow until almost the end of the test
at which point it catches up extremely quickly.
> * The performance in rows per second is similar to running smaller tests with a smaller
cluster locally (approx 35,000 rows per second). As a comparison, cassandra-stress manages
50,000 rows per second under the same set-up, therefore resulting 1.5 times faster. 
> See attached file _copy_from_large_benchmark.txt_ for the benchmark details.

This message was sent by Atlassian JIRA

View raw message