incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Cichonski <paul.cichon...@lithium.com>
Subject RE: heavy insert load overloads CPUs, with MutationStage pending
Date Thu, 12 Sep 2013 20:26:03 GMT
I'm running Cassandra 1.2.6 without compact storage on my tables. The trick is making your
Astyanax (I'm running 1.56.42) mutation work with the CQL table definition (this is definitely
a bit of a hack since most of the advice says don't mix the CQL and Thrift APIs so it is your
call on how far you want to go). If you want to still try and test it out you need to leverage
the Astyanax CompositeColumn construct to make it work (https://github.com/Netflix/astyanax/wiki/Composite-columns)

I've provided a slightly modified version of what I am doing below:

CQL table def:

CREATE TABLE standard_subscription_index
(
 	subscription_type text,
	subscription_target_id text,
	entitytype text,
	entityid int,
	creationtimestamp timestamp,
	indexed_tenant_id uuid,
	deleted boolean,
    PRIMARY KEY ((subscription_type, subscription_target_id), entitytype, entityid)
)

ColumnFamily definition:

private static final ColumnFamily<SubscriptionIndexCompositeKey, SubscribingEntityCompositeColumn>
COMPOSITE_ROW_COLUMN = new ColumnFamily<SubscriptionIndexCompositeKey, SubscribingEntityCompositeColumn>(
	SUBSCRIPTION_CF_NAME, new AnnotatedCompositeSerializer<SubscriptionIndexCompositeKey>(SubscriptionIndexCompositeKey.class),
	new AnnotatedCompositeSerializer<SubscribingEntityCompositeColumn>(SubscribingEntityCompositeColumn.class));


SubscriptionIndexCompositeKey is a class that contains the fields from the row key (e.g.,
subscription_type, subscription_target_id), and SubscribingEntityCompositeColumn contains
the fields from the composite column (as it would look if you view your data using Cassandra-cli),
so: entityType, entityId, columnName. The columnName field is the tricky part as it defines
what to interpret the column value as (i.e., if it is a value for the creationtimestamp the
column might be "someEntityType:4:creationtimestamp"

The actual mutation looks something like this:

final MutationBatch mutation = getKeyspace().prepareMutationBatch();
final ColumnListMutation<SubscribingEntityCompositeColumn> row = mutation.withRow(COMPOSITE_ROW_COLUMN,
		new SubscriptionIndexCompositeKey(targetEntityType.getName(), targetEntityId));

for (Subscription sub : subs) {
	row.putColumn(new SubscribingEntityCompositeColumn(sub.getEntityType().getName(), sub.getEntityId(),
				"creationtimestamp"), sub.getCreationTimestamp());
	row.putColumn(new SubscribingEntityCompositeColumn(sub.getEntityType().getName(), sub.getEntityId(),
				"deleted"), sub.isDeleted());
	row.putColumn(new SubscribingEntityCompositeColumn(sub.getEntityType().getName(), sub.getEntityId(),
				"indexed_tenant_id"), tenantId);
}

Hope that helps,
Paul


From: Keith Freeman [mailto:8forty@gmail.com] 
Sent: Thursday, September 12, 2013 12:10 PM
To: user@cassandra.apache.org
Subject: Re: heavy insert load overloads CPUs, with MutationStage pending

Ok, your results are pretty impressive, I'm giving it a try.  I've made some initial attempts
to use Astyanax 1.56.37, but have some troubles:

  - it's not compatible with 1.2.8 client-side ( NoSuchMethodError's on org.apache.cassandra.thrift.TBinaryProtocol,
which changed it's signature since 1.2.5)
  - even switching to C* 1.2.5 servers, it's been difficult getting simple examples to work
unless I use CF's that have "WITH COMPACT STORAGE"

How did you handle these problems?  How much effort did it take you to switch from datastax
to astyanax?  

I feel like I'm getting lost in a pretty deep rabbit-hole here.
On 09/11/2013 03:03 PM, Paul Cichonski wrote:
I was reluctant to use the thrift as well, and I spent about a week trying to get the CQL
inserts to work by partitioning the INSERTS in different ways and tuning the cluster.

However, nothing worked remotely as well as the batch_mutate when it came to writing a full
wide-row at once. I think Cassandra 2.0 makes CQL work better for these cases (CASSANDRA-4693),
but I haven't tested it yet.

-Paul

-----Original Message-----
From: Keith Freeman [mailto:8forty@gmail.com]
Sent: Wednesday, September 11, 2013 1:06 PM
To: user@cassandra.apache.org
Subject: Re: heavy insert load overloads CPUs, with MutationStage pending

Thanks, I had seen your stackoverflow post.  I've got hundreds of
(wide-) rows, and the writes are pretty well distributed across them.
I'm very reluctant to drop back to the thrift interface.

On 09/11/2013 10:46 AM, Paul Cichonski wrote:
How much of the data you are writing is going against the same row key?

I've experienced some issues using CQL to write a full wide-row at once
(across multiple threads) that exhibited some of the symptoms you have
described (i.e., high cpu, dropped mutations).

This question goes into it a bit
more:http://stackoverflow.com/questions/18522191/using-cassandra-and-
cql3-how-do-you-insert-an-entire-wide-row-in-a-single-reque  . I was able to
solve my issue by switching to using the thrift batch_mutate to write a full
wide-row at once instead of using many CQL INSERT statements.

-Paul

-----Original Message-----
From: Keith Freeman [mailto:8forty@gmail.com]
Sent: Wednesday, September 11, 2013 9:16 AM
To:user@cassandra.apache.org
Subject: Re: heavy insert load overloads CPUs, with MutationStage
pending


On 09/10/2013 11:42 AM, Nate McCall wrote:
With SSDs, you can turn up memtable_flush_writers - try 3 initially
(1 by default) and see what happens. However, given that there are
no entries in 'All time blocked' for such, they may be something else.
Tried that, it seems to have reduced the loads a little after
everything warmed-up, but not much.
How are you inserting the data?
A java client on a separate box using the datastax java driver, 48
threads writing 100 records each iteration as prepared batch statements.

At 5000 records/sec, the servers just can't keep up, so the client backs up.
That's only 5M of data/sec, which doesn't seem like much.  As I
mentioned, switching to SSDs didn't help much, so I'm assuming at
this point that the server overloads are what's holding up the client.



Mime
View raw message