From user-return-63378-archive-asf-public=cust-asf.ponee.io@cassandra.apache.org Wed Mar 6 01:10:17 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id 2FDA5180648 for ; Wed, 6 Mar 2019 02:10:16 +0100 (CET) Received: (qmail 65225 invoked by uid 500); 6 Mar 2019 01:10:14 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 65204 invoked by uid 99); 6 Mar 2019 01:10:14 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 06 Mar 2019 01:10:14 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id D5F6D182E48 for ; Wed, 6 Mar 2019 01:10:13 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.798 X-Spam-Level: ** X-Spam-Status: No, score=2.798 tagged_above=-999 required=6.31 tests=[DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_REPLY=1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=yahoo.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id nX2tNzjJ0Yld for ; Wed, 6 Mar 2019 01:10:12 +0000 (UTC) Received: from sonic312-23.consmr.mail.ne1.yahoo.com (sonic312-23.consmr.mail.ne1.yahoo.com [66.163.191.204]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id 1B6395F54E for ; Wed, 6 Mar 2019 01:10:10 +0000 (UTC) X-YMail-OSG: aZB0ClcVM1m1o7c.WQD38B50u2Agrt5HLCm95guthedmMGeArOlE3Eb86swpCFr 31Sc4F8f.Swv76wJjVpRRZD6WsEV15TvaKNCNd3MK2f24TdOUne2hbHAcwJBpriZJwv6MZOo5EAh iSRZXitibPEeVfC5cCaG0DM5tsYSGOETFhKFBDfajOLDpIagMwuKpxUmCn56HjrE_20MwWG8EHJq VtgcALqklJP6IUJee3IkRKGc.qVgSoVuy9LulSepEQmZ5Ry.R0Ym6epy0YM3b2YGBdSbZKvdxA7p A2rbL53B5r0M5t7.wPEwb26kbaFt1ZbXWxSCr8WWAyc1i8WTd9jdZNEYwGIucFpfDw85G_obRwol .QHajSh6kUsd1x6rM1cmDT9Hc9cdSuX67GvzNjO0KuxrLeYi38w0itouJEftDF0NlD6unqtP5tSg HHlMNtmj.jUMjSAD8ChpbbkfPmC.RbSJCYBffat9fdPMIXsuKbEFE3mnQDc3zKgWps2GXsDgZCT4 ZbyLijNHRdT16j4lQvrPxs0iWqIXYGesQNQGcYlS8.TYMZUQnnbfYnYecGDZiwkQ7myC1WoUbEB6 xdVgmkcwkxcnUNETUr.VtLpaeizWOeqqlgRGPnSflzhbP5CGRHkumwIuknUXKuGIDXuWwKcjrj4u V91DAKPSPHlHf_nqWfpnhgbsg3C33I3zI7q4rnkTnfF4fOKL_48mlHHLk.6D21jXMJC1lp6vCtWR wb0mqa1u5LpalOZGtJjodC3_pTQPw20hUD7YJDDbpunr_K0yC5K8SxAX1V3wd9dc6u.IuWMFHwl3 3Lpi7XYEQKWnqGTYi8mcbrlBYOwrZcTCAGwVIaO6XjJnFzOebn5Q6.Np.gjMUBiFMIbIRET8T8kl z7zinIZdomg5kxxQ.hxDUgFIve7ss0WNaQJyNdulsttrOFyJ7GhnSjH2AmH9u1eSXSLOIWfxBA45 IgWn6YV1nITzcBgFzdm7AO60ER4GumWRGyIdX5iL9t0nWOYj3aqoVJsCx8wjeoesxpkp4NmVbvdF ACMK.LUEfFXxbn03eLd_Y3TLtyFnWuA-- Received: from sonic.gate.mail.ne1.yahoo.com by sonic312.consmr.mail.ne1.yahoo.com with HTTP; Wed, 6 Mar 2019 01:10:03 +0000 Received: from c-24-10-87-155.hsd1.ca.comcast.net (EHLO i7) ([24.10.87.155]) by smtp402.mail.ne1.yahoo.com (Oath Hermes SMTP Server) with ESMTPA ID 1038e790f4b7800b726512c788237565 for ; Wed, 06 Mar 2019 01:09:58 +0000 (UTC) From: "Kenneth Brotman" To: References: <000301d4d3b4$2b6aa390$823feab0$@yahoo.com> In-Reply-To: <000301d4d3b4$2b6aa390$823feab0$@yahoo.com> Subject: RE: data modelling Date: Tue, 5 Mar 2019 17:09:56 -0800 Message-ID: <000001d4d3b9$50d53eb0$f27fbc10$@yahoo.com> MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_NextPart_000_0001_01D4D376.42B6B9A0" X-Mailer: Microsoft Outlook 14.0 Thread-Index: AQKlclPLdvPq/A2snBNkdAj5v+ZimQGcXi3OpE+mmZA= Content-Language: en-us ------=_NextPart_000_0001_01D4D376.42B6B9A0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable You definitely don=E2=80=99t need a secondary index. A MV might be the = answer. =20 =20 How many tagids does a sensor have ? Do you have to use a collection for tagids? How many sensors would you expect to have a particular tagid? Would you know the customerid and sensorid and be able to specify that = in the query? =20 If you could have tagid not be a collection, and make it part of the = primary key, that would help a lot. =20 =20 From: Kenneth Brotman [mailto:kenbrotman@yahoo.com.INVALID]=20 Sent: Tuesday, March 05, 2019 4:33 PM To: user@cassandra.apache.org Subject: RE: data modelling =20 Hi Bobbie, =20 You=E2=80=99re not giving enough information to model the data. With = Cassandra it=E2=80=99s based on the queries you are going to need. This = link to Jeffrey Carpenter=E2=80=99s book, Cassandra the Definitive = Guide, Chapter 5, which is on how to do data modeling for Cassandra, = should be of help to you: = https://books.google.com/books?id=3DuW-PDAAAQBAJ = = &pg=3DPA79&lpg=3DPA79&dq=3Djeff+carpenter+chapter+5&source=3Dbl&ots=3D58c= M-BII2M&sig=3DACfU3U0-188Fw-jcj1tbMItdPlNH8Lk9yQ&hl=3Den&sa=3DX&ved=3D2ah= UKEwinrY3OoezgAhWoHDQIHRfmA7IQ6AEwA3oECAcQAQ#v=3Donepage&q=3Djeff%20carpe= nter%20chapter%205&f=3Dfalse =20 =20 =20 From: Bobbie Haynes [mailto:haynes30349@gmail.com]=20 Sent: Tuesday, March 05, 2019 4:19 PM To: user@cassandra.apache.org Subject: data modelling =20 Hi=20 Could you help modelling this usecase=20 =20 I have below table ..I will update tagid's columns set(bigit) based = on PK. I have created the secondary index column on tagid to query like = below.. =20 Select * from keyspace.customer_sensor_tagids where tagids CONTAINS = 11358097; =20 this query is doing the range scan because of the secondary index.. and = causing performance issues=20 =20 If i create a MV on Tagid's can i be able to query like above.. please = suggest a Datamodel for this scenario.Apprecite your help on this. -------------------------------------------------------------------------= ---------------------- -------------------------------------------------------------------------= ---------------------- example of Tagids for each row:- 4608831, 608886, 608890, 609164, 615024, 679579, 814791, 830404, = 71756, 8538307, 9936868, 10883336, 10954034, 10958062, 10976553, = 10976554, 10980255, 11009971, 11043805, 11075379, 11078819, 11167844, = 11358097, 11479340, 11481769, 11481770, 11481771, 11481772, 11693597, = 11709012, 12193230, 12421500, 12421516, 12421781, 12422011, 12422368, = 12422501, 12422512, 12422553, 12422555, 12423381, 12423382 =20 = -------------------------------------------------------------------------= ---------------------- -------------------------------------------------------------------------= ----------------------=20 =20 CREATE TABLE keyspace.customer_sensor_tagids ( customerid bigint, sensorid bigint, XXX frozen, XXX frozen, XXX text, XXX text, XXX frozen, XXX bigint, XXX bigint, XXX list>, XXX frozen, XXX boolean, XXX bigint, XXX list>, XXX frozen, XXX bigint, XXX bigint, XXX list>, XXX list>, XXX set>, XXX set, XXX set, tagids set, XXX bigint, XXX list>, PRIMARY KEY ((customerid, sensorid)) ) WITH bloom_filter_fp_chance =3D 0.01 AND caching =3D {'keys': 'ALL', 'rows_per_partition': 'NONE'} AND comment =3D '' AND compaction =3D {'class': = 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', = 'max_threshold': '32', 'min_threshold': '4'} AND compression =3D {'chunk_length_in_kb': '64', 'class': = 'org.apache.cassandra.io.compress.LZ4Compressor'} AND crc_check_chance =3D 1.0 AND dclocal_read_repair_chance =3D 0.1 AND default_time_to_live =3D 0 AND gc_grace_seconds =3D 864000 AND max_index_interval =3D 2048 AND memtable_flush_period_in_ms =3D 0 AND min_index_interval =3D 128 AND read_repair_chance =3D 0.0 AND speculative_retry =3D '99PERCENTILE'; CREATE INDEX XXX ON keyspace.customer_sensor_tagids (values(tagids)); CREATE INDEX XXX ON keyspace.customer_sensor_tagids (values(XXX)); CREATE INDEX XXX ON keyspace.customer_sensor_tagids (XXX); CREATE INDEX XXX ON keyspace.customer_sensor_tagids (XXX); CREATE INDEX XXX ON keyspace.customer_sensor_tagids (XXX); CREATE INDEX XXX ON keyspace.customer_sensor_tagids (XXX); CREATE INDEX XXX ON keyspace.customer_sensor_tagids (values(XXX)); CREATE INDEX XXX ON keyspace.customer_sensor_tagids (XXX); ------=_NextPart_000_0001_01D4D376.42B6B9A0 Content-Type: text/html; charset="utf-8" Content-Transfer-Encoding: quoted-printable

You definitely don=E2=80=99t need a secondary index.=C2=A0 A MV might = be the answer.=C2=A0

 

How many tagids does a sensor have ?

Do you have to use a collection for tagids?

How many sensors would you expect to have a particular = tagid?

Would you know the customerid and sensorid and be able to specify = that in the query?

 

If you could have tagid not be a collection, and make it part of the = primary key, that would help a lot.

=C2=A0

 

From:= = Kenneth Brotman [mailto:kenbrotman@yahoo.com.INVALID]
Sent: = Tuesday, March 05, 2019 4:33 PM
To: = user@cassandra.apache.org
Subject: RE: data = modelling

 

Hi Bobbie,

 

You=E2=80=99re not giving enough information to model the data.  = With Cassandra it=E2=80=99s based on the queries you are going to = need.  This link to Jeffrey Carpenter=E2=80=99s book, Cassandra the = Definitive Guide, Chapter 5, which is on how to do data modeling for = Cassandra, should be of help to you: https://books.go= ogle.com/books?id=3DuW-PDAAAQBAJ&pg=3DPA79&lpg=3DPA79&dq=3Dje= ff+carpenter+chapter+5&source=3Dbl&ots=3D58cM-BII2M&sig=3DACf= U3U0-188Fw-jcj1tbMItdPlNH8Lk9yQ&hl=3Den&sa=3DX&ved=3D2ahUKEwi= nrY3OoezgAhWoHDQIHRfmA7IQ6AEwA3oECAcQAQ#v=3Donepage&q=3Djeff%20carpen= ter%20chapter%205&f=3Dfalse

 

 

 

From:= = Bobbie Haynes [mailto:haynes30349@gmail.com]
Sent: Tuesday, = March 05, 2019 4:19 PM
To: = user@cassandra.apache.org
Subject: data = modelling

 

Hi 

   Could you help  modelling this = usecase 

 

   I have below table ..I will update = tagid's columns set(bigit) based on PK. I have created the secondary = index column on tagid to query like below..

 

Select * from keyspace.customer_sensor_tagids where = tagids CONTAINS 11358097;

 

this query is doing the range scan because of the = secondary index.. and causing performance = issues 

 

If i create a MV on Tagid's can i be able to query = like above.. please suggest a Datamodel for this scenario.Apprecite your = help on this.

-------------------------------------------------------= ----------------------------------------

-------------------------------------------------------= ----------------------------------------

example of Tagids for each = row:-

  =  4608831, 608886, 608890, 609164, 615024, 679579, 814791, 830404, = 71756, 8538307, 9936868, 10883336, 10954034, 10958062, 10976553, = 10976554, 10980255, 11009971, 11043805, 11075379, 11078819, 11167844, = 11358097, 11479340, 11481769, 11481770, 11481771, 11481772, 11693597, = 11709012, 12193230, 12421500, 12421516, 12421781, 12422011, 12422368, = 12422501, 12422512, 12422553, 12422555, 12423381, = 12423382

 

  =  -------------------------------------------------------------------= ----------------------------

-------------------------------------------------------= ---------------------------------------- 

<= p class=3DMsoNormal> 

   CREATE TABLE = keyspace.customer_sensor_tagids (

    customerid = bigint,

    = sensorid bigint,

  =   XXX frozen<activity>,

    XXX = frozen<asset>,

    XXX text,

    XXX text,

    XXX = frozen<hardware>,

    XXX bigint,

    XXX bigint,

    XXX = list<frozen<interface>>,

    XXX = frozen<inventory>,

    XXX boolean,

    XXX bigint,

    XXX = list<frozen<openport>>,

    XXX = frozen<operatingsystem>,

    XXX bigint,

    XXX bigint,

    XXX = list<frozen<processor>>,

    XXX = list<frozen<service>>,

    XXX = set<frozen<software>>,

    XXX = set<bigint>,

  =   XXX set<bigint>,

    tagids = set<bigint>,

  =   XXX bigint,

  =   XXX list<frozen<volume>>,

    PRIMARY KEY ((customerid, = sensorid))

) WITH = bloom_filter_fp_chance =3D 0.01

    AND caching =3D {'keys': 'ALL', = 'rows_per_partition': 'NONE'}

    AND comment =3D = ''

    AND = compaction =3D {'class': = 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', = 'max_threshold': '32', 'min_threshold': '4'}

    AND compression =3D = {'chunk_length_in_kb': '64', 'class': = 'org.apache.cassandra.io.compress.LZ4Compressor'}

    AND crc_check_chance =3D = 1.0

    AND = dclocal_read_repair_chance =3D 0.1

    AND default_time_to_live =3D = 0

    AND = gc_grace_seconds =3D 864000

    AND max_index_interval =3D = 2048

    AND = memtable_flush_period_in_ms =3D 0

    AND min_index_interval =3D = 128

    AND = read_repair_chance =3D 0.0

    AND speculative_retry =3D = '99PERCENTILE';

CREATE = INDEX XXX ON keyspace.customer_sensor_tagids = (values(tagids));

CREATE = INDEX XXX ON keyspace.customer_sensor_tagids = (values(XXX));

CREATE = INDEX XXX ON keyspace.customer_sensor_tagids = (XXX);

CREATE INDEX XXX ON = keyspace.customer_sensor_tagids (XXX);

CREATE INDEX XXX ON keyspace.customer_sensor_tagids = (XXX);

CREATE INDEX XXX ON = keyspace.customer_sensor_tagids (XXX);

CREATE INDEX XXX ON keyspace.customer_sensor_tagids = (values(XXX));

CREATE = INDEX XXX ON keyspace.customer_sensor_tagids = (XXX);

------=_NextPart_000_0001_01D4D376.42B6B9A0--