cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sylvain Lebresne (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-7444) Performance drops when creating large amount of tables
Date Wed, 25 Jun 2014 08:04:24 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-7444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14043163#comment-14043163
] 

Sylvain Lebresne commented on CASSANDRA-7444:
---------------------------------------------

Not sure if that's the same than what Brandon mention, but currently {{DefsTable.mergeSchemaInternal}}
will read the existing schema in it's entirety twice (pre and post update), not matter what
the actual update is. A pretty trivial update would consist in checking which keyspaces are
actually updated (by gathering the keys of the mutations in parameters) and only reading the
schema for those keyspaces.

This will obviously only help in the case of multiple keyspaces and won't magically make creating
crap tons of tables a good idea, but mentioning it as it's a very simple change we could start
with (note that we could get finer-grained than the keyspace to figure out what needs to be
read, but it's slightly more involved. Doing it for keyspace is really trivial since all schema
tables use the keyspace name as partition key).

> Performance drops when creating large amount of tables 
> -------------------------------------------------------
>
>                 Key: CASSANDRA-7444
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7444
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>         Environment: [cqlsh 3.1.8 | Cassandra 1.2.15.1 | CQL spec 3.0.0 | Thrift protocol
19.36.2][cqlsh 4.1.1 | Cassandra 2.0.7.31 | CQL spec 3.1.1 | Thrift protocol 19.39.0]
>            Reporter: Jose Martinez Poblete
>            Priority: Minor
>              Labels: cassandra
>
> We are creating 4000 tables from a script and using cqlsh to create the tables. As the
tables are being created, the time taken grows exponentially and it becomes very slow and
takes a lot of time.
> We read a file get the keyspace append a random number and then create keyspace with
this new name example Airplane_12345678, Airplane_123575849... then fed into cqlsh via script
> Similarly each table is created via script use Airplane_12345678; create table1...table25
, then use Airplane_123575849; create table1...create table25
> It is all done in singleton fashion, doing one after the other in a loop.
> We tested using the following bash script
> {noformat}
> #!/bin/bash
> SEED=0
> ITERATIONS=20
> while [ ${SEED} -lt ${ITERATIONS} ]; do
>    COUNT=0
>    KEYSPACE=t10789_${SEED}
>    echo "CREATE KEYSPACE ${KEYSPACE} WITH replication = { 'class': 'NetworkTopologyStrategy',
'Cassandra': '1' };"  > ${KEYSPACE}.ddl
>    echo "USE ${KEYSPACE};" >> ${KEYSPACE}.ddl
>    while [ ${COUNT} -lt 25 ]; do
>       echo "CREATE TABLE user_colors${COUNT} (user_id int PRIMARY KEY, colors list<ascii>
);" >> ${KEYSPACE}.ddl
>       ((COUNT++))
>    done 
>    ((SEED++))
>    time cat ${KEYSPACE}.ddl | cqlsh
>    if [ "$?" -gt 0 ]; then
>       echo "[ERROR] Failure at ${KEYSPACE}"
>       exit 1
>    else
>       echo "[OK]    Created ${KEYSPACE}"
>    fi
>    echo "==============================="
>    sleep 3
> done
> #EOF
> {noformat}
> The timing we got on an otherwise idle system were inconsistent
> {noformat}
> real    0m42.649s
> user    0m0.332s
> sys     0m0.092s
> [OK]    Created t10789_0
> ===============================
> real    1m22.211s
> user    0m0.332s
> sys     0m0.096s
> [OK]    Created t10789_1
> ===============================
> real    2m45.907s
> user    0m0.304s
> sys     0m0.124s
> [OK]    Created t10789_2
> ===============================
> real    3m24.098s
> user    0m0.340s
> sys     0m0.108s
> [OK]    Created t10789_3
> ===============================
> real    2m38.930s
> user    0m0.324s
> sys     0m0.116s
> [OK]    Created t10789_4
> ===============================
> real    3m4.186s
> user    0m0.336s
> sys     0m0.104s
> [OK]    Created t10789_5
> ===============================
> real    2m55.391s
> user    0m0.344s
> sys     0m0.092s
> [OK]    Created t10789_6
> ===============================
> real    2m14.290s
> user    0m0.328s
> sys     0m0.108s
> [OK]    Created t10789_7
> ===============================
> real    2m44.880s
> user    0m0.344s
> sys     0m0.092s
> [OK]    Created t10789_8
> ===============================
> real    1m52.785s
> user    0m0.336s
> sys     0m0.128s
> [OK]    Created t10789_9
> ===============================
> real    1m18.404s
> user    0m0.344s
> sys     0m0.108s
> [OK]    Created t10789_10
> ===============================
> real    2m20.681s
> user    0m0.348s
> sys     0m0.104s
> [OK]    Created t10789_11
> ===============================
> real    1m11.860s
> user    0m0.332s
> sys     0m0.096s
> [OK]    Created t10789_12
> ===============================
> real    1m37.887s
> user    0m0.324s
> sys     0m0.100s
> [OK]    Created t10789_13
> ===============================
> real    1m31.616s
> user    0m0.316s
> sys     0m0.132s
> [OK]    Created t10789_14
> ===============================
> real    1m12.103s
> user    0m0.360s
> sys     0m0.088s
> [OK]    Created t10789_15
> ===============================
> real    0m36.378s
> user    0m0.340s
> sys     0m0.092s
> [OK]    Created t10789_16
> ===============================
> real    0m40.883s
> user    0m0.352s
> sys     0m0.096s
> [OK]    Created t10789_17
> ===============================
> real    0m40.661s
> user    0m0.332s
> sys     0m0.096s
> [OK]    Created t10789_18
> ===============================
> real    0m44.943s
> user    0m0.324s
> sys     0m0.104s
> [OK]    Created t10789_19
> ===============================
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message