cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Petter. Andreas" <>
Subject Slow performance because of used-up "Waste" in AtomicBTreeColumns
Date Thu, 23 Jul 2015 10:55:14 GMT
Hello everyone,

we are experiencing performance issues with Cassandra overloading effects (dropped mutations
and node drop-outs) with the following workload:

create table test (year bigint, spread bigint, time bigint, batchid bigint, value set<text>,
primary key ((year, spread), time, batchid))
inserting data using an update statement ("+" operator to merge the sets). Data _is_being_ordered_
before the mutation is executed on the session. Number of inserts range from 400k to a few

Originally we were using scalding/summingbird and thought the problem to be in our Cassandra-storage-code.
To test that i wrote a simple cascading-hadoop job (not using BulkOutputFormat, but the Datastax
driver). I was a little bit surprised to still see Cassandra _overload_ (3 reducers/Hadoop-writers
and 3 co-located Cassandra nodes, as well as a setup with 4/4 nodes). The internal reason
seems to be that many worker threads go into state BLOCKED in AtomicBTreeColumns.addAllWithSizeDelta,
because called "waste" is used up and Cassandra switches to pessimistic locking.

However, i re-wrote the job using plain Hadoop-mapred (without cascading) but using the same
storage abstraction for writing and Cassandra _did_not_overload_ and the job has the great
write-performance i'm used to (and threads are not going into state BLOCKED).  We're totally
lost and puzzled.

So i have a few questions:
1. What is this "waste" used for? Is it a way of braking or load shedding? Why is locking
being used in AtomicBTreeColumns?
2. Is it o.k. to order columns before inserts are being performed?
3. What could be the reason that "waste" is being used-up in the cascading job and not  in
the plain Hadoop-job (sorting order?)?
4. Is there any way to circumvent using up "waste" (except for scaling nodes, which does not
seem to be the answer, as the plain Hadoop job runs Cassandra-"friendly")?

thanks in advance,

SEEBURGER AG            Vorstand/SEEBURGER Executive Board:
Sitz der Gesellschaft/Registered Office:                Bernd Seeburger, Axel Haas, Michael
Kleeberg, Friedemann Heinz, Dr. Martin Kuntz, Matthias Feßenbecker
Edisonstr. 1
D-75015 Bretten         Vorsitzende des Aufsichtsrats/Chairperson of the SEEBURGER Supervisory
Tel.: 07252 / 96 - 0            Prof. Dr. Simone Zeuchner
Fax: 07252 / 96 - 2222
Internet:               Registergericht/Commercial Register:
e-mail:               HRB 240708 Mannheim

Dieses E-Mail ist nur für den Empfänger bestimmt, an den es gerichtet ist und kann vertrauliches
bzw. unter das Berufsgeheimnis fallendes Material enthalten. Jegliche darin enthaltene Ansicht
oder Meinungsäußerung ist die des Autors und stellt nicht notwendigerweise die Ansicht oder
Meinung der SEEBURGER AG dar. Sind Sie nicht der Empfänger, so haben Sie diese E-Mail irrtümlich
erhalten und jegliche Verwendung, Veröffentlichung, Weiterleitung, Abschrift oder jeglicher
Druck dieser E-Mail ist strengstens untersagt. Weder die SEEBURGER AG noch der Absender (Petter.
Andreas) übernehmen die Haftung für Viren; es obliegt Ihrer Verantwortung, die E-Mail und
deren Anhänge auf Viren zu prüfen.

This email is intended only for the recipient(s) to whom it is addressed. This email may contain
confidential material that may be protected by professional secrecy. Any fact or opinion contained,
or expression of the material herein, does not necessarily reflect that of SEEBURGER AG. If
you are not the addressee or if you have received this email in error, any use, publication
or distribution including forwarding, copying or printing is strictly prohibited. Neither
SEEBURGER AG, nor the sender (Petter. Andreas) accept liability for viruses; it is your responsibility
to check this email and its attachments for viruses.

View raw message