cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Stupp (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-7444) Performance drops when creating large amount of tables
Date Sat, 14 Mar 2015 23:32:39 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-7444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14362099#comment-14362099
] 

Robert Stupp commented on CASSANDRA-7444:
-----------------------------------------

Problem here is updating the schema version, which has to read all involved tables ({{schema_keyspaces}},
{{schema_columnfamilies}}, {{schema_columns}}, {{schema_triggers}}, {{schema_usertypes}},
{{schema_functions}}, {{schema_aggregates}}.
This patch should gives some improvement - but really "guilty" is {{calculateSchemaDigest()}}.
A better solution might be to calculate one digest per keyspace and then calculate the schema-digest
using individual keyspace-digests. But that would only make sense when we have all that stuff
in memory (i.e. not having to read the schema from the tables every time).

> Performance drops when creating large amount of tables 
> -------------------------------------------------------
>
>                 Key: CASSANDRA-7444
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7444
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>         Environment: [cqlsh 3.1.8 | Cassandra 1.2.15.1 | CQL spec 3.0.0 | Thrift protocol
19.36.2][cqlsh 4.1.1 | Cassandra 2.0.7.31 | CQL spec 3.1.1 | Thrift protocol 19.39.0]
>            Reporter: Jose Martinez Poblete
>            Assignee: Aleksey Yeschenko
>            Priority: Minor
>             Fix For: 2.1.1
>
>         Attachments: 7444-2.0.txt, 7444.txt
>
>
> We are creating 4000 tables from a script and using cqlsh to create the tables. As the
tables are being created, the time taken grows exponentially and it becomes very slow and
takes a lot of time.
> We read a file get the keyspace append a random number and then create keyspace with
this new name example Airplane_12345678, Airplane_123575849... then fed into cqlsh via script
> Similarly each table is created via script use Airplane_12345678; create table1...table25
, then use Airplane_123575849; create table1...create table25
> It is all done in singleton fashion, doing one after the other in a loop.
> We tested using the following bash script
> {noformat}
> #!/bin/bash
> SEED=0
> ITERATIONS=20
> while [ ${SEED} -lt ${ITERATIONS} ]; do
>    COUNT=0
>    KEYSPACE=t10789_${SEED}
>    echo "CREATE KEYSPACE ${KEYSPACE} WITH replication = { 'class': 'NetworkTopologyStrategy',
'Cassandra': '1' };"  > ${KEYSPACE}.ddl
>    echo "USE ${KEYSPACE};" >> ${KEYSPACE}.ddl
>    while [ ${COUNT} -lt 25 ]; do
>       echo "CREATE TABLE user_colors${COUNT} (user_id int PRIMARY KEY, colors list<ascii>
);" >> ${KEYSPACE}.ddl
>       ((COUNT++))
>    done 
>    ((SEED++))
>    time cat ${KEYSPACE}.ddl | cqlsh
>    if [ "$?" -gt 0 ]; then
>       echo "[ERROR] Failure at ${KEYSPACE}"
>       exit 1
>    else
>       echo "[OK]    Created ${KEYSPACE}"
>    fi
>    echo "==============================="
>    sleep 3
> done
> #EOF
> {noformat}
> The timing we got on an otherwise idle system were inconsistent
> {noformat}
> real    0m42.649s
> user    0m0.332s
> sys     0m0.092s
> [OK]    Created t10789_0
> ===============================
> real    1m22.211s
> user    0m0.332s
> sys     0m0.096s
> [OK]    Created t10789_1
> ===============================
> real    2m45.907s
> user    0m0.304s
> sys     0m0.124s
> [OK]    Created t10789_2
> ===============================
> real    3m24.098s
> user    0m0.340s
> sys     0m0.108s
> [OK]    Created t10789_3
> ===============================
> real    2m38.930s
> user    0m0.324s
> sys     0m0.116s
> [OK]    Created t10789_4
> ===============================
> real    3m4.186s
> user    0m0.336s
> sys     0m0.104s
> [OK]    Created t10789_5
> ===============================
> real    2m55.391s
> user    0m0.344s
> sys     0m0.092s
> [OK]    Created t10789_6
> ===============================
> real    2m14.290s
> user    0m0.328s
> sys     0m0.108s
> [OK]    Created t10789_7
> ===============================
> real    2m44.880s
> user    0m0.344s
> sys     0m0.092s
> [OK]    Created t10789_8
> ===============================
> real    1m52.785s
> user    0m0.336s
> sys     0m0.128s
> [OK]    Created t10789_9
> ===============================
> real    1m18.404s
> user    0m0.344s
> sys     0m0.108s
> [OK]    Created t10789_10
> ===============================
> real    2m20.681s
> user    0m0.348s
> sys     0m0.104s
> [OK]    Created t10789_11
> ===============================
> real    1m11.860s
> user    0m0.332s
> sys     0m0.096s
> [OK]    Created t10789_12
> ===============================
> real    1m37.887s
> user    0m0.324s
> sys     0m0.100s
> [OK]    Created t10789_13
> ===============================
> real    1m31.616s
> user    0m0.316s
> sys     0m0.132s
> [OK]    Created t10789_14
> ===============================
> real    1m12.103s
> user    0m0.360s
> sys     0m0.088s
> [OK]    Created t10789_15
> ===============================
> real    0m36.378s
> user    0m0.340s
> sys     0m0.092s
> [OK]    Created t10789_16
> ===============================
> real    0m40.883s
> user    0m0.352s
> sys     0m0.096s
> [OK]    Created t10789_17
> ===============================
> real    0m40.661s
> user    0m0.332s
> sys     0m0.096s
> [OK]    Created t10789_18
> ===============================
> real    0m44.943s
> user    0m0.324s
> sys     0m0.104s
> [OK]    Created t10789_19
> ===============================
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message