From user-return-63379-archive-asf-public=cust-asf.ponee.io@cassandra.apache.org Wed Mar 6 01:10:55 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id BDE1E18067C for ; Wed, 6 Mar 2019 02:10:54 +0100 (CET) Received: (qmail 69111 invoked by uid 500); 6 Mar 2019 01:10:47 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 69101 invoked by uid 99); 6 Mar 2019 01:10:47 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 06 Mar 2019 01:10:47 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 744FEC0853 for ; Wed, 6 Mar 2019 01:10:47 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.81 X-Spam-Level: * X-Spam-Status: No, score=1.81 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001, T_REMOTE_IMAGE=0.01, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (1024-bit key) header.d=instaclustr.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id Aw5DUKMc_pXf for ; Wed, 6 Mar 2019 01:10:44 +0000 (UTC) Received: from mail-vs1-f48.google.com (mail-vs1-f48.google.com [209.85.217.48]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id BB8285F336 for ; Wed, 6 Mar 2019 01:10:43 +0000 (UTC) Received: by mail-vs1-f48.google.com with SMTP id h7so1212094vsl.1 for ; Tue, 05 Mar 2019 17:10:43 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=instaclustr.com; s=instaclustr.email.auth; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=doVMX5Cs1g0oo/emcMFoz8QSqWxocE+Un1L5xewsTjs=; b=Uoe8hzv4+/obm1KAycY4om4dYgKv8gcnGpOpEtXenOzE+whfrozLvTdEdH4daYLSv/ MjJDDAYYFDl1BJohjXWJ0XsY+NuZkGiD2oHQU8cr3dBPlfyMI98ZO9kMs4kh4BWc+dvU p0VdAITBq42/9BxpW1YME6Y5M+VMXbYc8a9E4= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=doVMX5Cs1g0oo/emcMFoz8QSqWxocE+Un1L5xewsTjs=; b=jqaUN9zCVtW/3lvQ5gLpQt9vCZHfaTmoLvwI6YoqatEICCXxeZYW0JZdtzu9UIOQTM zfl9N+cAoGvCRBVvnlbseZawQqDPUq7S15BEFUHtNsCsTSKGhz2ghMRkCDrVJw1TuSe5 h6kS16q5UiAk+X03zFI8ozNhiA+GZMSsAW1teO06jMhiv8F4vz9vikp+R65cBH8INZiI rNzVNZLonFRvOS9+E/ZluYtB9TNpTMdO/bb5gJTB+2kdKk4tZx0tV5hT42NfyuBtYEmG mf2GgP5FdYa+mCNRUPRyBgARu9OfA042qTNNSuz9ojI+tklFWvU57BRkRbuMvW8/NmP0 t8jQ== X-Gm-Message-State: APjAAAXxv5iCmd+xrzjqO2rf6futTZG82oS9PT4XEJKKPUQFlNeTWJfq oiNKZEMf/0L7dAQwlz/ZdDbzpL9Y31ToJfx8u/nTPEmpSSo= X-Google-Smtp-Source: APXvYqx2UJCJ5EpgTwg4Ob41Q5X7NUc9bJ09z3KhBIfhUhCPkPmjY4UgvMi7yO6HWv5kPVSq4weQX3cpsz3C3kafcqU= X-Received: by 2002:a67:b346:: with SMTP id b6mr2657765vsm.97.1551834642421; Tue, 05 Mar 2019 17:10:42 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Stefan Miklosovic Date: Wed, 6 Mar 2019 12:10:31 +1100 Message-ID: Subject: Re: data modelling To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary="0000000000009cb04a058362a8c6" --0000000000009cb04a058362a8c6 Content-Type: text/plain; charset="UTF-8" Hi Bobbie, as Kenneth already mentioned, you should model your schema based on what queries you are expecting to do and read related literature. From what I see your table is named "customer_sensor_tagids" so its quite possible you would have tagids as a part of primary key? Something like: select * from keyspace.customer_sensor_tagids where tag_id = 11358097. This implies that you would have as many records per customer and sensor ids as many tag_id's there are. If you want to query such table and you know customerid and sensorid in advance, you could query like select * from keyspace.customer_sensor_tagids where customerid = X and sensorid =Y and tag_id = 11358097 so your primary key would look like (customerid, sensorid, tagid) or ((customerid, sensorid), tagid) If you do not know customerid nor sensorid while doing a query, you would have to make tag_id a partition key and customerid and sensorid clustering columns, optionally ordered, thats up to you. Now you may object that there would be data duplication as you would have to have "as many tables as queries" which might be true but thats not in general a problem. Thats the cost you "pay" for having queries super fast and tailored for your use case. I suggest to read more about data modelling in general. On Wed, 6 Mar 2019 at 11:19, Bobbie Haynes wrote: > Hi > Could you help modelling this usecase > > I have below table ..I will update tagid's columns set(bigit) based on > PK. I have created the secondary index column on tagid to query like below.. > > Select * from keyspace.customer_sensor_tagids where tagids CONTAINS > 11358097; > > this query is doing the range scan because of the secondary index.. and > causing performance issues > > If i create a MV on Tagid's can i be able to query like above.. please > suggest a Datamodel for this scenario.Apprecite your help on this. > > ----------------------------------------------------------------------------------------------- > > ----------------------------------------------------------------------------------------------- > example of Tagids for each row:- > 4608831, 608886, 608890, 609164, 615024, 679579, 814791, 830404, 71756, > 8538307, 9936868, 10883336, 10954034, 10958062, 10976553, 10976554, > 10980255, 11009971, 11043805, 11075379, 11078819, 11167844, 11358097, > 11479340, 11481769, 11481770, 11481771, 11481772, 11693597, 11709012, > 12193230, 12421500, 12421516, 12421781, 12422011, 12422368, 12422501, > 12422512, 12422553, 12422555, 12423381, 12423382 > > > ----------------------------------------------------------------------------------------------- > > ----------------------------------------------------------------------------------------------- > > CREATE TABLE keyspace.customer_sensor_tagids ( > customerid bigint, > sensorid bigint, > XXX frozen, > XXX frozen, > XXX text, > XXX text, > XXX frozen, > XXX bigint, > XXX bigint, > XXX list>, > XXX frozen, > XXX boolean, > XXX bigint, > XXX list>, > XXX frozen, > XXX bigint, > XXX bigint, > XXX list>, > XXX list>, > XXX set>, > XXX set, > XXX set, > tagids set, > XXX bigint, > XXX list>, > PRIMARY KEY ((customerid, sensorid)) > ) WITH bloom_filter_fp_chance = 0.01 > AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'} > AND comment = '' > AND compaction = {'class': > 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', > 'max_threshold': '32', 'min_threshold': '4'} > AND compression = {'chunk_length_in_kb': '64', 'class': > 'org.apache.cassandra.io.compress.LZ4Compressor'} > AND crc_check_chance = 1.0 > AND dclocal_read_repair_chance = 0.1 > AND default_time_to_live = 0 > AND gc_grace_seconds = 864000 > AND max_index_interval = 2048 > AND memtable_flush_period_in_ms = 0 > AND min_index_interval = 128 > AND read_repair_chance = 0.0 > AND speculative_retry = '99PERCENTILE'; > CREATE INDEX XXX ON keyspace.customer_sensor_tagids (values(tagids)); > CREATE INDEX XXX ON keyspace.customer_sensor_tagids (values(XXX)); > CREATE INDEX XXX ON keyspace.customer_sensor_tagids (XXX); > CREATE INDEX XXX ON keyspace.customer_sensor_tagids (XXX); > CREATE INDEX XXX ON keyspace.customer_sensor_tagids (XXX); > CREATE INDEX XXX ON keyspace.customer_sensor_tagids (XXX); > CREATE INDEX XXX ON keyspace.customer_sensor_tagids (values(XXX)); > CREATE INDEX XXX ON keyspace.customer_sensor_tagids (XXX); > -- *Stefan Miklosovic**Senior Software Engineer* M: +61459911436 Read our latest technical blog posts here . This email has been sent on behalf of Instaclustr Pty. Limited (Australia) and Instaclustr Inc (USA). This email and any attachments may contain confidential and legally privileged information. If you are not the intended recipient, do not copy or disclose its content, but please reply to this email immediately and highlight the error to the sender and then immediately delete the message. Instaclustr values your privacy. Our privacy policy can be found at https://www.instaclustr.com/company/policies/privacy-policy --0000000000009cb04a058362a8c6 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi Bobbie,

as Kenneth already mentioned= , you should model your schema based on what queries you are expecting to d= o and read related literature. From what I see your table is named "cu= stomer_sensor_tagids" so its quite possible you would have tagids as a= part of primary key? Something like:

select * fro= m keyspace.customer_sensor_tagids where tag_id =3D 11358097.

=
This implies that you would have as many records per customer an= d sensor ids as many tag_id's there are. If you want to query such tabl= e and you know customerid and sensorid in advance, you could query like

select * from keyspace.customer_sensor_tagids where c= ustomerid =3D X and sensorid =3DY and tag_id =3D 11358097
so your primary key would look like (customerid, sensorid, tagi= d) or ((customerid, sensorid), tagid)

If you do no= t know customerid nor sensorid while doing a query, you would have to make = tag_id a partition key and customerid and sensorid clustering columns, opti= onally ordered, thats up to you. Now you may object that there would be dat= a duplication as you would have to have "as many tables as queries&quo= t; which might be true but thats not in general a problem. Thats the cost y= ou "pay" for having queries super fast and tailored for your use = case.

I suggest to read more about data modelling = in general.

On Wed, 6 Mar 2019 at 11:19, Bobbie Haynes <haynes30349@gmail.com> wrote:
Hi=C2=A0
=C2=A0 =C2=A0Could you help=C2=A0 modelli= ng this usecase=C2=A0

=C2=A0 =C2=A0I have below ta= ble ..I will update tagid's columns set(bigit) based on PK. I have crea= ted the secondary index column on tagid to query like below..
Select * from keyspace.customer_sensor_tagids where tagids CONT= AINS 11358097;

this query is doing the range scan = because of the secondary index.. and causing performance issues=C2=A0
=

If i create a MV on Tagid's can i be able to query = like above.. please suggest a Datamodel for this scenario.Apprecite your he= lp on this.
-----------------------------------------------------= ------------------------------------------
----------------------= -------------------------------------------------------------------------
example of Tagids for each row:-
=C2=A0 =C2=A04608831, 6= 08886, 608890, 609164, 615024, 679579, 814791, 830404, 71756, 8538307, 9936= 868, 10883336, 10954034, 10958062, 10976553, 10976554, 10980255, 11009971, = 11043805, 11075379, 11078819, 11167844, 11358097, 11479340, 11481769, 11481= 770, 11481771, 11481772, 11693597, 11709012, 12193230, 12421500, 12421516, = 12421781, 12422011, 12422368, 12422501, 12422512, 12422553, 12422555, 12423= 381, 12423382

=C2=A0 =C2=A0-----------------------= ------------------------------------------------------------------------
-------------------------------------------------------------------= ----------------------------=C2=A0

=C2=A0 =C2=A0CR= EATE TABLE keyspace.customer_sensor_tagids (
=C2=A0 =C2=A0 custom= erid bigint,
=C2=A0 =C2=A0 sensorid bigint,
=C2=A0 =C2= =A0 XXX frozen<activity>,
=C2=A0 =C2=A0 XXX frozen<asset= >,
=C2=A0 =C2=A0 XXX text,
=C2=A0 =C2=A0 XXX text,
=C2=A0 =C2=A0 XXX frozen<hardware>,
=C2=A0 =C2=A0 = XXX bigint,
=C2=A0 =C2=A0 XXX bigint,
=C2=A0 =C2=A0 XXX= list<frozen<interface>>,
=C2=A0 =C2=A0 XXX frozen<= ;inventory>,
=C2=A0 =C2=A0 XXX boolean,
=C2=A0 =C2= =A0 XXX bigint,
=C2=A0 =C2=A0 XXX list<frozen<openport>&= gt;,
=C2=A0 =C2=A0 XXX frozen<operatingsystem>,
= =C2=A0 =C2=A0 XXX bigint,
=C2=A0 =C2=A0 XXX bigint,
=C2= =A0 =C2=A0 XXX list<frozen<processor>>,
=C2=A0 =C2=A0= XXX list<frozen<service>>,
=C2=A0 =C2=A0 XXX set<= frozen<software>>,
=C2=A0 =C2=A0 XXX set<bigint>,<= /div>
=C2=A0 =C2=A0 XXX set<bigint>,
=C2=A0 =C2=A0 tagi= ds set<bigint>,
=C2=A0 =C2=A0 XXX bigint,
=C2=A0 = =C2=A0 XXX list<frozen<volume>>,
=C2=A0 =C2=A0 PRIMAR= Y KEY ((customerid, sensorid))
) WITH bloom_filter_fp_chance =3D = 0.01
=C2=A0 =C2=A0 AND caching =3D {'keys': 'ALL'= , 'rows_per_partition': 'NONE'}
=C2=A0 =C2=A0 AND= comment =3D ''
=C2=A0 =C2=A0 AND compaction =3D {'cl= ass': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrat= egy', 'max_threshold': '32', 'min_threshold': &= #39;4'}
=C2=A0 =C2=A0 AND compression =3D {'chunk_length_= in_kb': '64', 'class': 'org.apache.cassandra.io.com= press.LZ4Compressor'}
=C2=A0 =C2=A0 AND crc_check_chance =3D = 1.0
=C2=A0 =C2=A0 AND dclocal_read_repair_chance =3D 0.1
=C2=A0 =C2=A0 AND default_time_to_live =3D 0
=C2=A0 =C2=A0 AND = gc_grace_seconds =3D 864000
=C2=A0 =C2=A0 AND max_index_interval = =3D 2048
=C2=A0 =C2=A0 AND memtable_flush_period_in_ms =3D 0
=C2=A0 =C2=A0 AND min_index_interval =3D 128
=C2=A0 =C2=A0 = AND read_repair_chance =3D 0.0
=C2=A0 =C2=A0 AND speculative_retr= y =3D '99PERCENTILE';
CREATE INDEX XXX ON keyspace.custom= er_sensor_tagids (values(tagids));
CREATE INDEX XXX ON keyspace.c= ustomer_sensor_tagids (values(XXX));
CREATE INDEX XXX ON keyspace= .customer_sensor_tagids (XXX);
CREATE INDEX XXX ON keyspace.custo= mer_sensor_tagids (XXX);
CREATE INDEX XXX ON keyspace.customer_se= nsor_tagids (XXX);
CREATE INDEX XXX ON keyspace.customer_sensor_t= agids (XXX);
CREATE INDEX XXX ON keyspace.customer_sensor_tagids = (values(XXX));
CREATE INDEX XXX ON keyspace.customer_sensor_tagid= s (XXX);


--

Stefan Miklosovic
Senior Software Engineer


M: +6= 1459911436


=C2=A0=C2=A0=C2=A0=C2=A0

R= ead our latest technical blog posts=C2=A0here.

This email has been sent on behalf of= =C2=A0Instaclustr Pty. Limited (Australia) and=C2=A0Instaclustr Inc (USA).<= /span>

This email and = any attachments may=C2=A0contain confidential and legally privileged=C2=A0i= nformation.=C2=A0 If you are not the intended=C2=A0recipient, do not copy o= r disclose its=C2=A0content, but please reply to this email=C2=A0immediatel= y and highlight the error to the=C2=A0sender and then immediately delete th= e=C2=A0message.

Instaclustr values your privacy. Our p= rivacy policy can be found at=C2=A0https://www.instaclustr.com/company/policies/= privacy-policy

--0000000000009cb04a058362a8c6--