Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 218AE200BB3 for ; Wed, 2 Nov 2016 21:44:00 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 1EF06160AFB; Wed, 2 Nov 2016 20:44:00 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id EA25D160AF0 for ; Wed, 2 Nov 2016 21:43:58 +0100 (CET) Received: (qmail 92756 invoked by uid 500); 2 Nov 2016 20:43:57 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 92746 invoked by uid 99); 2 Nov 2016 20:43:57 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 02 Nov 2016 20:43:57 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 0456F180424 for ; Wed, 2 Nov 2016 20:43:57 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.379 X-Spam-Level: ** X-Spam-Status: No, score=2.379 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RCVD_IN_SORBS_SPAM=0.5, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (1024-bit key) header.d=tink.se Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id eOzf5n2ScT-h for ; Wed, 2 Nov 2016 20:43:54 +0000 (UTC) Received: from mail-wm0-f43.google.com (mail-wm0-f43.google.com [74.125.82.43]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id E69A25F22F for ; Wed, 2 Nov 2016 20:43:53 +0000 (UTC) Received: by mail-wm0-f43.google.com with SMTP id a197so165388089wmd.0 for ; Wed, 02 Nov 2016 13:43:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=tink.se; s=tink; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=J+KXwJq/wPB8SOgL7CYsKklgGV9JAHz+EdaKwmoZaMo=; b=FsrjArA6xBdIei9zJhEw1rHTNnCihl0+KCSX6AgqyfSY6NtegBneFrjzRtpY1Bdm4D ZPfzdJ4Ej3KV4UTYUZtJmeExsbbFAhYkW9jTCdTxVllMktZyZxxljHeVWm04Ir2y2Ha5 h2IVWWHoEGEDaCn5VsiZSMQL5j1yy0QY6UOcA= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=J+KXwJq/wPB8SOgL7CYsKklgGV9JAHz+EdaKwmoZaMo=; b=lZKAr2YRt7ue5QiNY1wb2OeXpbYfU777JFxIXhAIt2zDPBJZQJRCIaZyYqazog2RJD GCENqzkffy6pv6FfJcM1P15U3LeShFwj0uQPVsAK0keXlEg4dK3Ouexd/LA9EMZlqsAH RsOvQbJkAJo9v0ahVX+RRx+//rllBJIOvyj0JkKStFzxi8hUocQ7QzcMtyv85rtEwWoB +p6Vwfgxwe9czo5nD5XCPEet5pMUzEHbRButq/pHjBJ0mw6lTMgjkaAH5UTSEqYhv+9w eY7PdnUTJlMuECor5AOaPhsds9ArOkltS7+vk7xQvLR8ex1wRXSf4lQjimCFtH60j2Ln wrmA== X-Gm-Message-State: ABUngvcwmJKOGY0tpvSIwpxTKTvEfDozKt05Toj2gKBVEy9l9kxECejiWKv8qRvCCbFVFz653DQ9Z0d7zgwwa2h2 X-Received: by 10.28.73.135 with SMTP id w129mr5107791wma.42.1478119431143; Wed, 02 Nov 2016 13:43:51 -0700 (PDT) MIME-Version: 1.0 Received: by 10.28.97.65 with HTTP; Wed, 2 Nov 2016 13:43:30 -0700 (PDT) In-Reply-To: References: From: Jens Rantil Date: Wed, 2 Nov 2016 10:43:30 -1000 Message-ID: Subject: Re: Cassandra Poor Read Performance Response Time To: Cassandra Group Content-Type: multipart/alternative; boundary=001a114b2d9ea11b160540577fa9 archived-at: Wed, 02 Nov 2016 20:44:00 -0000 --001a114b2d9ea11b160540577fa9 Content-Type: text/plain; charset=UTF-8 Hi, I am by no means an expert on Cassandra, nor on DateTieredCompactionStrategy. However, looking in "Query 2.xlsx" I see a lot of Partition index with 0 entries found for sstable 186 To me, that looks like Cassandra is looking at a lot of sstables and realize too late that they don't contain any relevant data. Are you using TTLs when you write data? Do the TTLs vary? If they do, there's a risk Cassandra will have to inspect a lot of tables that turns out to hold expired data. Also, have you checked `nodetool cfstats` and bloom filter false positives? Does `nodetool cfhistograms` give you any insights? I'm mostly thinking in terms of unbalanced partition keys. Have you checked the logs for how long GC pauses are being taken? Somewhat implementation specific: Would adjusting the time bucket to a smaller time resolution be an option? Also, since you are using DateTieredCompactionStrategy, have you considered using a TIMESTAMP constraint[1]? That might help you a lot actually. [1] https://issues.apache.org/jira/browse/CASSANDRA-5514 Cheers, Jens On Mon, Oct 31, 2016 at 11:10 PM, _ _ wrote: > Hi > > Currently i am running a cassandra cluster of 3 nodes (with it replicating > to both nodes) and am experiencing poor performance, usually getting second > response times when running queries when i am expecting/needing millisecond > response times. Currently i have a table which looks like: > > CREATE TABLE tracker.all_ad_impressions_counter_1d ( > time_bucket bigint, > ad_id text, > uc text, > count counter, > PRIMARY KEY ((time_bucket, ad_id), uc) > ) WITH CLUSTERING ORDER BY (uc ASC) > AND bloom_filter_fp_chance = 0.01 > AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'} > AND comment = '' > AND compaction = {'base_time_seconds': '3600', 'class': > 'org.apache.cassandra.db.compaction.DateTieredCompactionStrategy', > 'max_sstable_age_days': '30', 'max_threshold': '32', 'min_threshold': '4', > 'timestamp_resolution': 'MILLISECONDS'} > AND compression = {'chunk_length_in_kb': '64', 'class': ' > org.apache.cassandra.io.compress.LZ4Compressor'} > AND crc_check_chance = 1.0 > AND dclocal_read_repair_chance = 0.1 > AND default_time_to_live = 0 > AND gc_grace_seconds = 864000 > AND max_index_interval = 2048 > AND memtable_flush_period_in_ms = 0 > AND min_index_interval = 128 > AND read_repair_chance = 0.0 > AND speculative_retry = '99PERCENTILE'; > > > and queries which look like: > > SELECT > time_bucket, > uc, > count > FROM > all_ad_impressions_counter_1d > > WHERE ad_id = ? > AND time_bucket = ? > > the cluster is running on servers with 16 GB RAM, and 4 CPU cores and 3 > 100GB datastores, the storage is not local and these VMs are being managed > through openstack. There are roughly 200 million records being written per > day (1 time_bucket) and maybe a few thousand records per partition > (time_bucket, ad_id) at most. The amount of writes is not having a > significant effect on our read performance as when writes are stopped, the > read response time does not improve noticeably. I have attached a trace of > one query i ran which took around 3 seconds which i would expect to take > well below a second. I have also included the cassandra.yaml file and jvm > options file. We do intend to change the storage to local storage and > expect this will have a significant impact but i was wondering if there's > anything else which could be changed which will also have a significant > impact on read performance? > > Thanks > Ian > > -- Jens Rantil Backend engineer Tink AB Email: jens.rantil@tink.se Phone: +46 708 84 18 32 Web: www.tink.se Facebook Linkedin Twitter --001a114b2d9ea11b160540577fa9 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Hi,

I am by no means an expert on Cassa= ndra, nor on DateTieredCompactionStrategy. However, looking in "Query = 2.xlsx" I see a lot of

=C2=A0 =C2=A0=C2=A0Par= tition index with 0 entries found for sstable 186

= To me, that looks like Cassandra is looking at a lot of sstables and realiz= e too late that they don't contain any relevant data. Are you using TTL= s when you write data? Do the TTLs vary? If they do, there's a risk Cas= sandra will have to inspect a lot of tables that turns out to hold expired = data. Also, have you checked `nodetool cfstats` and bloom filter false posi= tives?

Does `nodetool cfhistograms` give you any i= nsights? I'm mostly thinking in terms of unbalanced partition keys.

Have you checked the logs for how long GC pauses are = being taken?

Somewhat implementation specific: Wou= ld adjusting the time bucket to a smaller time resolution be an option?

Also, since you are using DateTieredCompactionStrateg= y, have you considered using a TIMESTAMP constraint[1]? That might help you= a lot actually.


Cheers,
Jens

On Mon, Oc= t 31, 2016 at 11:10 PM, _ _ <rage39a@hotmail.com> wrote:

Hi<= /span>


Currently i am running a cassandra cluster of 3 nodes (with it replicating = to both nodes)=C2=A0and am experiencing poor performance, usually getting s= econd response times when running queries when=C2=A0i am=C2=A0expecting/nee= ding millisecond response times. Currently=C2=A0i=C2=A0have a table which looks like:

CREATE TABLE tracker.al= l_ad_impressions_counter_1d (
=C2=A0 =C2=A0 time_buck= et bigint,
=C2=A0 =C2=A0 ad_id tex= t,
=C2=A0 =C2=A0 uc text,<= /font>
=C2=A0 =C2=A0 count cou= nter,
=C2=A0 =C2=A0 PRIMARY K= EY ((time_bucket, ad_id), uc)
) WITH CLUSTERING ORDER= BY (uc ASC)
=C2=A0 =C2=A0 AND bloom= _filter_fp_chance =3D 0.01
=C2=A0 =C2=A0 AND cachi= ng =3D {'keys': 'ALL', 'rows_per_partition': 'N= ONE'}
=C2=A0 =C2=A0 AND comme= nt =3D ''
=C2=A0 =C2=A0 AND compa= ction =3D {'base_time_seconds': '3600', 'class': &#= 39;org.apache.cassandra.db.compaction.DateTieredCompactionStrateg= y', 'max_sstable_age_days': '30', 'max_threshold= 9;: '32', 'min_threshold': '4', 'timestamp_resolution': 'MILLISECONDS'}
=C2=A0 =C2=A0 AND compr= ession =3D {'chunk_length_in_kb': '64', 'class': &#= 39;org.apache.= cassandra.io.compress.LZ4Compressor'}
=C2=A0 =C2=A0 AND crc_c= heck_chance =3D 1.0
=C2=A0 =C2=A0 AND dcloc= al_read_repair_chance =3D 0.1
=C2=A0 =C2=A0 AND defau= lt_time_to_live =3D 0
=C2=A0 =C2=A0 AND gc_gr= ace_seconds =3D 864000
=C2=A0 =C2=A0 AND max_i= ndex_interval =3D 2048
=C2=A0 =C2=A0 AND memta= ble_flush_period_in_ms =3D 0
=C2=A0 =C2=A0 AND min_i= ndex_interval =3D 128
=C2=A0 =C2=A0 AND read_= repair_chance =3D 0.0
=C2=A0 =C2=A0 AND specu= lative_retry =3D '99PERCENTILE';


and queries which look like:

=C2=A0 =C2=A0 =C2=A0 =C2=A0=C2=A0SELECT
=C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 time_bucket,
=C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 uc,
=C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 count
=C2=A0 =C2=A0 =C2=A0 = =C2=A0 FROM
=C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 all_ad_impressions_counter_1d

=C2=A0 =C2=A0 =C2=A0 = =C2=A0 WHERE ad_id =3D ?
=C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 AND time_bucket =3D ?

the cluster is running on servers with 16 GB RAM, and 4 CPU cores and 3 100= GB datastores, the storage is not local and these VMs are being managed thr= ough openstack.=C2=A0There are=C2=A0roughly 200 million records=C2=A0being = written per day (1 time_bucket) and maybe a few thousand records per partition (time_bucket,=C2=A0ad_id) at most. The amou= nt of writes is not having a significant effect on our read performance as = when writes are stopped, the read response time does not improve noticeably= .=C2=A0I have attached a trace of one query i ran which took around 3 seconds which i would expect to take well below= =C2=A0a second. I have also included the cassandra.yaml file and jvm option= s file. We do intend to change the storage to local storage and expect this= will have a significant impact but i was wondering if there's anything else which could be changed which wi= ll also have a significant impact on read performance?

Thanks
Ian




--
Jens Rantil
Backend engineer
Tink AB

<= /div>
Phone:= +46 708 84 18 32
Web:=C2=A0www.tink.se
--001a114b2d9ea11b160540577fa9--