cassandra-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cheng Ren <>
Subject Re: Cassandra Schema Change Performance Improvement
Date Tue, 29 Jul 2014 19:25:23 GMT
Hi Graham,

Thanks for your suggestion.

Your proposed 3rd solution looks great! However, given that we have the
requirement to perform low level sstable operation, like replicate, backup,
or sstable2json for only the data where K1=X, it would be relatively hard
for us to do that due to the fact that these data with different K1 are
intertwined together within one single sstable in the node.

That said, we believe what we're proposing here can be very useful for
other people in the Cassandra community as well. Attached is our proposed
patch for the template schema feature. Is it possible for the community to
consider accepting this patch in the main branch of latest Cassandra? Or,
would you mind providing us feedbacks? Please let us know if you have any
concerns or suggestions regarding the change.


On Thu, Jul 24, 2014 at 5:56 PM, graham sanderson <> wrote:

> Just as a general observation, there is a third possible solution
> depending on the number of columns involved per K1 (which looks reasonable
> based on your numbers) but also on how much contention there is for updates
> to particular (K1, K2) values (which are always unique partitions in your
> options 1 & 2) and on any other query use cases you have.
> 3) Create a single table with a composite partition key (K1, hK2) and make
> K2 (the first) part of the clustering key; where hK2 is N bits of a good
> hash of K2… if N is reasonably small, you can easily read all K1 = X rows
> by selecting the 2^N keys (X, 0) thru (X, 2^N - 1)
> On Jul 24, 2014, at 5:49 PM, Cheng Ren <> wrote:
> Hi,
> Cassandra schema change is the performance painpoint for us, since it's
> the global information across the entire cluster. Our production cassandra
> cluster consists of a lot of sets of column families, which totals 1000
> column families, and 38301 columns, which sum up to 3.2MB.
> We have a data model where the primary key is split into two parts K1 ,
> K2. Lets say the cardinality of set K1 is small. We also have a constraint
> that we frequently want to scan all rows that belong to a particular value
> of K1.
> In this case cassandra offers two possible solutions.
> 1) Create a single table with a composite key (K1, K2)
> 2) Create a table per K1, with primary key as K2
> In option #1: The number of tables is only 1, however we lose the ability
> to easily scan all rows in K1= X without paying the penalty of reading all
> rows in the table.
> Option #2 : gives us the freedom to scan only a particular value of K1.
> However it leads to significant potentially unbounded increase in # of
> tables. However if the size of set (K1) is relatively small , this is a
> feasible option with a cleaner data interface.
> An example of this data model is where we have a set of merchants with
> products. Then K1 = merchant_id and K2 = product Id. The number of
> merchants is still very small compared to # of products.
> Option #2 is our solution since size of set k1 for us is relatively small,
> but also creates a fair amount of tables per K1 which have exactly same
> columns and metadata, so whenever we need to add/drop one attribute for all
> of our tables per K1, it puts a lot of loads on the entire cluster, and all
> backend pipelines will be affected, or even have to be shutdown to
> accommodate the schema change.
> To reduce the load of this kind of schema change,  we came up with a new
> feature called "*template*".  We can create a template, and then create
> tables with that template.
> ex:
> create template template_table ( block_id text, PRIMARY KEY (block_id));
> create table table_a, table_b, table_c with template_table;
> This allows us to reduce the time of metadata gossip. Moreover, when we
> need to add one more attribute for all of our merchant, we just need to
> alter template:
> alter template template_table add foo text;
> which also alters table_a, table_b, table_c.
> We changed the system keyspace a bit to accommodate the template feature:
> schema_columnfamilies only stores the metadata of template and
> non-templated column families.
> schema_columns only stores the column info of template and non-templated
> cfs.
> and we added a new table in system keyspace called
> schema_columnfamilies_templated,
> which manages the mapping relationship between template and templated cfs.
> So like this:
> schema_columnfamilies_templated:
> keyspace, columnfamily_name, template_name
> XXX,         table_a,                 template_table
> XXX,         table_b,                 template_table
> XXX,         table_c,                 template_table
> We already have some performance results in our 15-node cluster. Normally
> creating 400 tables takes more than hours for all the migration stage tasks
> to complete , but if we create 400 tables with templates, *it just takes
> 1 to 2 seconds*. It also works great for alter table.
> table # in the graph means the number of existing tables in user keyspaces.
> We created 400 more tables and measure the time all tasks in migration
> stage take to complete. Besides, we also measure the migration task
> completion time for adding one column for a template, which will also add
> the column for all the column families with that template.
> Any feedback is greatly appreciated, and please also let us know if you
> have any question.
> Thanks,
> Cheng

View raw message