Return-Path: X-Original-To: apmail-cassandra-commits-archive@www.apache.org Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id AA5C617AFC for ; Sat, 14 Mar 2015 23:32:39 +0000 (UTC) Received: (qmail 73263 invoked by uid 500); 14 Mar 2015 23:32:39 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 73224 invoked by uid 500); 14 Mar 2015 23:32:39 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 73205 invoked by uid 99); 14 Mar 2015 23:32:39 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 14 Mar 2015 23:32:39 +0000 Date: Sat, 14 Mar 2015 23:32:39 +0000 (UTC) From: "Robert Stupp (JIRA)" To: commits@cassandra.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (CASSANDRA-7444) Performance drops when creating large amount of tables MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CASSANDRA-7444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14362099#comment-14362099 ] Robert Stupp commented on CASSANDRA-7444: ----------------------------------------- Problem here is updating the schema version, which has to read all involved tables ({{schema_keyspaces}}, {{schema_columnfamilies}}, {{schema_columns}}, {{schema_triggers}}, {{schema_usertypes}}, {{schema_functions}}, {{schema_aggregates}}. This patch should gives some improvement - but really "guilty" is {{calculateSchemaDigest()}}. A better solution might be to calculate one digest per keyspace and then calculate the schema-digest using individual keyspace-digests. But that would only make sense when we have all that stuff in memory (i.e. not having to read the schema from the tables every time). > Performance drops when creating large amount of tables > ------------------------------------------------------- > > Key: CASSANDRA-7444 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7444 > Project: Cassandra > Issue Type: Improvement > Components: Core > Environment: [cqlsh 3.1.8 | Cassandra 1.2.15.1 | CQL spec 3.0.0 | Thrift protocol 19.36.2][cqlsh 4.1.1 | Cassandra 2.0.7.31 | CQL spec 3.1.1 | Thrift protocol 19.39.0] > Reporter: Jose Martinez Poblete > Assignee: Aleksey Yeschenko > Priority: Minor > Fix For: 2.1.1 > > Attachments: 7444-2.0.txt, 7444.txt > > > We are creating 4000 tables from a script and using cqlsh to create the tables. As the tables are being created, the time taken grows exponentially and it becomes very slow and takes a lot of time. > We read a file get the keyspace append a random number and then create keyspace with this new name example Airplane_12345678, Airplane_123575849... then fed into cqlsh via script > Similarly each table is created via script use Airplane_12345678; create table1...table25 , then use Airplane_123575849; create table1...create table25 > It is all done in singleton fashion, doing one after the other in a loop. > We tested using the following bash script > {noformat} > #!/bin/bash > SEED=0 > ITERATIONS=20 > while [ ${SEED} -lt ${ITERATIONS} ]; do > COUNT=0 > KEYSPACE=t10789_${SEED} > echo "CREATE KEYSPACE ${KEYSPACE} WITH replication = { 'class': 'NetworkTopologyStrategy', 'Cassandra': '1' };" > ${KEYSPACE}.ddl > echo "USE ${KEYSPACE};" >> ${KEYSPACE}.ddl > while [ ${COUNT} -lt 25 ]; do > echo "CREATE TABLE user_colors${COUNT} (user_id int PRIMARY KEY, colors list );" >> ${KEYSPACE}.ddl > ((COUNT++)) > done > ((SEED++)) > time cat ${KEYSPACE}.ddl | cqlsh > if [ "$?" -gt 0 ]; then > echo "[ERROR] Failure at ${KEYSPACE}" > exit 1 > else > echo "[OK] Created ${KEYSPACE}" > fi > echo "===============================" > sleep 3 > done > #EOF > {noformat} > The timing we got on an otherwise idle system were inconsistent > {noformat} > real 0m42.649s > user 0m0.332s > sys 0m0.092s > [OK] Created t10789_0 > =============================== > real 1m22.211s > user 0m0.332s > sys 0m0.096s > [OK] Created t10789_1 > =============================== > real 2m45.907s > user 0m0.304s > sys 0m0.124s > [OK] Created t10789_2 > =============================== > real 3m24.098s > user 0m0.340s > sys 0m0.108s > [OK] Created t10789_3 > =============================== > real 2m38.930s > user 0m0.324s > sys 0m0.116s > [OK] Created t10789_4 > =============================== > real 3m4.186s > user 0m0.336s > sys 0m0.104s > [OK] Created t10789_5 > =============================== > real 2m55.391s > user 0m0.344s > sys 0m0.092s > [OK] Created t10789_6 > =============================== > real 2m14.290s > user 0m0.328s > sys 0m0.108s > [OK] Created t10789_7 > =============================== > real 2m44.880s > user 0m0.344s > sys 0m0.092s > [OK] Created t10789_8 > =============================== > real 1m52.785s > user 0m0.336s > sys 0m0.128s > [OK] Created t10789_9 > =============================== > real 1m18.404s > user 0m0.344s > sys 0m0.108s > [OK] Created t10789_10 > =============================== > real 2m20.681s > user 0m0.348s > sys 0m0.104s > [OK] Created t10789_11 > =============================== > real 1m11.860s > user 0m0.332s > sys 0m0.096s > [OK] Created t10789_12 > =============================== > real 1m37.887s > user 0m0.324s > sys 0m0.100s > [OK] Created t10789_13 > =============================== > real 1m31.616s > user 0m0.316s > sys 0m0.132s > [OK] Created t10789_14 > =============================== > real 1m12.103s > user 0m0.360s > sys 0m0.088s > [OK] Created t10789_15 > =============================== > real 0m36.378s > user 0m0.340s > sys 0m0.092s > [OK] Created t10789_16 > =============================== > real 0m40.883s > user 0m0.352s > sys 0m0.096s > [OK] Created t10789_17 > =============================== > real 0m40.661s > user 0m0.332s > sys 0m0.096s > [OK] Created t10789_18 > =============================== > real 0m44.943s > user 0m0.324s > sys 0m0.104s > [OK] Created t10789_19 > =============================== > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)