Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 3F0F917E5B for ; Mon, 23 Mar 2015 18:55:24 +0000 (UTC) Received: (qmail 26288 invoked by uid 500); 23 Mar 2015 18:55:20 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 26244 invoked by uid 500); 23 Mar 2015 18:55:20 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 26234 invoked by uid 99); 23 Mar 2015 18:55:20 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 23 Mar 2015 18:55:20 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of mightye@gmail.com designates 209.85.213.169 as permitted sender) Received: from [209.85.213.169] (HELO mail-ig0-f169.google.com) (209.85.213.169) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 23 Mar 2015 18:55:16 +0000 Received: by igbqf9 with SMTP id qf9so47126461igb.1 for ; Mon, 23 Mar 2015 11:53:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=mY1Ut+598hYm68OsJ8Y9vE36OdECuHEx5ccwx3wDLmk=; b=cWNEYeJ3AzBF8kqsBA0rj8phYkvq2yXP/mgq91LH6BFMSg8EhQZ2qqw8oipRauAXGi 3N26xHzzYpWAi2bX1DHk/io5pWPn6TFDDtoo1mDXRfvxmYl+ZtQyIPX7u9g4N9CQgLxH H9qeyAolwRXRgFt5JJFUTD1i3uJNS0c15qflTlQi8Llm7Pn1w91Ln5qxBQZFBC6KJnPg oMWCYOmb1kkn5PLxahKPwrDY+aCidKAWqTWPHBPjYNCJEsw7ZI4j+aCxCNDR93IXhcZr H2+rSiLWqjT5mbyG3n+yDdSASUpbE6w6Hvuk2z6pW1YP5sxpoLZGfcO4wt/5ceyFM7zY hO0Q== X-Received: by 10.43.13.71 with SMTP id pl7mr21779053icb.31.1427136806122; Mon, 23 Mar 2015 11:53:26 -0700 (PDT) MIME-Version: 1.0 Received: by 10.36.93.208 with HTTP; Mon, 23 Mar 2015 11:53:05 -0700 (PDT) In-Reply-To: References: From: Eric Stevens Date: Mon, 23 Mar 2015 12:53:05 -0600 Message-ID: Subject: Re: Really high read latency To: "user@cassandra.apache.org" Content-Type: multipart/alternative; boundary=bcaec511e1165ffc120511f92e5e X-Virus-Checked: Checked by ClamAV on apache.org --bcaec511e1165ffc120511f92e5e Content-Type: text/plain; charset=UTF-8 Enable tracing in cqlsh and see how many sstables are being lifted to satisfy the query (are you repeatedly writing to the same partition [row_time]) over time?). Also watch for whether you're hitting a lot of tombstones (are you deleting lots of values in the same partition over time?). On Mon, Mar 23, 2015 at 4:01 AM, Dave Galbraith wrote: > Duncan: I'm thinking it might be something like that. I'm also seeing just > a ton of garbage collection on the box, could it be pulling rows for all > 100k attrs for a given row_time into memory since only row_time is the > partition key? > > Jens: I'm not using EBS (although I used to until I read up on how useless > it is). I'm not sure what constitutes proper paging but my client has a > pretty small amount of available memory so I'm doing pages of size 5k using > the C++ Datastax driver. > > Thanks for the replies! > > -Dave > > On Mon, Mar 23, 2015 at 2:00 AM, Jens Rantil wrote: > >> Also, two control questions: >> >> - Are you using EBS for data storage? It might introduce additional >> latencies. >> - Are you doing proper paging when querying the keyspace? >> >> Cheers, >> Jens >> >> On Mon, Mar 23, 2015 at 5:56 AM, Dave Galbraith < >> david92galbraith@gmail.com> wrote: >> >>> Hi! So I've got a table like this: >>> >>> CREATE TABLE "default".metrics (row_time int,attrs varchar,offset >>> int,value double, PRIMARY KEY(row_time, attrs, offset)) WITH COMPACT >>> STORAGE AND bloom_filter_fp_chance=0.01 AND caching='KEYS_ONLY' AND >>> comment='' AND dclocal_read_repair_chance=0 AND gc_grace_seconds=864000 AND >>> index_interval=128 AND read_repair_chance=1 AND replicate_on_write='true' >>> AND populate_io_cache_on_flush='false' AND default_time_to_live=0 AND >>> speculative_retry='NONE' AND memtable_flush_period_in_ms=0 AND >>> compaction={'class':'DateTieredCompactionStrategy','timestamp_resolution':'MILLISECONDS'} >>> AND compression={'sstable_compression':'LZ4Compressor'}; >>> >>> and I'm running Cassandra on an EC2 m3.2xlarge out in the cloud, with 4 >>> GB of heap space. So it's timeseries data that I'm doing so I increment >>> "row_time" each day, "attrs" is additional identifying information about >>> each series, and "offset" is the number of milliseconds into the day for >>> each data point. So for the past 5 days, I've been inserting 3k >>> points/second distributed across 100k distinct "attrs"es. And now when I >>> try to run queries on this data that look like >>> >>> "SELECT * FROM "default".metrics WHERE row_time = 5 AND attrs = >>> 'potatoes_and_jam'" >>> >>> it takes an absurdly long time and sometimes just times out. I did >>> "nodetool cftsats default" and here's what I get: >>> >>> Keyspace: default >>> Read Count: 59 >>> Read Latency: 397.12523728813557 ms. >>> Write Count: 155128 >>> Write Latency: 0.3675690719921613 ms. >>> Pending Flushes: 0 >>> Table: metrics >>> SSTable count: 26 >>> Space used (live): 35146349027 >>> Space used (total): 35146349027 >>> Space used by snapshots (total): 0 >>> SSTable Compression Ratio: 0.10386468749216264 >>> Memtable cell count: 141800 >>> Memtable data size: 31071290 >>> Memtable switch count: 41 >>> Local read count: 59 >>> Local read latency: 397.126 ms >>> Local write count: 155128 >>> Local write latency: 0.368 ms >>> Pending flushes: 0 >>> Bloom filter false positives: 0 >>> Bloom filter false ratio: 0.00000 >>> Bloom filter space used: 2856 >>> Compacted partition minimum bytes: 104 >>> Compacted partition maximum bytes: 36904729268 >>> Compacted partition mean bytes: 986530969 >>> Average live cells per slice (last five minutes): >>> 501.66101694915255 >>> Maximum live cells per slice (last five minutes): 502.0 >>> Average tombstones per slice (last five minutes): 0.0 >>> Maximum tombstones per slice (last five minutes): 0.0 >>> >>> Ouch! 400ms of read latency, orders of magnitude higher than it has any >>> right to be. How could this have happened? Is there something fundamentally >>> broken about my data model? Thanks! >>> >>> >> >> >> -- >> Jens Rantil >> Backend engineer >> Tink AB >> >> Email: jens.rantil@tink.se >> Phone: +46 708 84 18 32 >> Web: www.tink.se >> >> Facebook Linkedin >> >> Twitter >> > > --bcaec511e1165ffc120511f92e5e Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Enable tracing in cqlsh and see how many sstables are bein= g lifted to satisfy the query (are you repeatedly writing to the same parti= tion [row_time]) over time?).

Also watch for whether you= 're hitting a lot of tombstones (are you deleting lots of values in the= same partition over time?).

On Mon, Mar 23, 2015 at 4:01 AM, Dave Galbraith <david92galbraith@gmail.com> wrote:
Duncan: I'm thinking it might= be something like that. I'm also seeing just a ton of garbage collecti= on on the box, could it be pulling rows for all 100k attrs for a given row_= time into memory since only row_time is the partition key?

Jen= s: I'm not using EBS (although I used to until I read up on how useless= it is). I'm not sure what constitutes proper paging but my client has = a pretty small amount of available memory so I'm doing pages of size 5k= using the C++ Datastax driver.

Thanks for the replies!

-Dave

On Mon, Mar 23, 2015 at 2:00 AM, Jens Rant= il <jens.rantil@tink.se> wrote:
Also, two control questions:
  • Are you u= sing EBS for data storage? It might introduce additional latencies.
  • Are you doing proper paging when querying the keyspace?
Chee= rs,
Jens
On Mon, Mar 23, 2015 at 5:56 AM, Dave Galbrait= h <david92galbraith@gmail.com> wrote:
Hi! So I've got a table like this= :

CREATE TABLE "default".metrics (row_time int,attrs varch= ar,offset int,value double, PRIMARY KEY(row_time, attrs, offset)) WITH COMP= ACT STORAGE AND bloom_filter_fp_chance=3D0.01 AND caching=3D'KEYS_ONLY&= #39; AND comment=3D'' AND dclocal_read_repair_chance=3D0 AND gc_gra= ce_seconds=3D864000 AND index_interval=3D128 AND read_repair_chance=3D1 AND= replicate_on_write=3D'true' AND populate_io_cache_on_flush=3D'= false' AND default_time_to_live=3D0 AND speculative_retry=3D'NONE&#= 39; AND memtable_flush_period_in_ms=3D0 AND compaction=3D{'class':&= #39;DateTieredCompactionStrategy','timestamp_resolution':'M= ILLISECONDS'} AND compression=3D{'sstable_compression':'LZ4= Compressor'};

and I'm running Cassandra on an EC2 m3.2xlarge= out in the cloud, with 4 GB of heap space. So it's timeseries data tha= t I'm doing so I increment "row_time" each day, "attrs&q= uot; is additional identifying information about each series, and "off= set" is the number of milliseconds into the day for each data point. S= o for the past 5 days, I've been inserting 3k points/second distributed= across 100k distinct "attrs"es. And now when I try to run querie= s on this data that look like

"SELECT * FROM "default"= ;.metrics WHERE row_time =3D 5 AND attrs =3D 'potatoes_and_jam'&quo= t;

it takes an absurdly long time and sometimes just times out. I di= d "nodetool cftsats default" and here's what I get:

Ke= yspace: default
=C2=A0=C2=A0=C2=A0 Read Count: 59
=C2=A0=C2=A0=C2=A0 = Read Latency: 397.12523728813557 ms.
=C2=A0=C2=A0=C2=A0 Write Count: 155= 128
=C2=A0=C2=A0=C2=A0 Write Latency: 0.3675690719921613 ms.
=C2=A0= =C2=A0=C2=A0 Pending Flushes: 0
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0 Table: metrics
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 SSTable co= unt: 26
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 Space used (live): 35= 146349027
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 Space used (total):= 35146349027
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 Space used by sn= apshots (total): 0
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 SSTable Co= mpression Ratio: 0.10386468749216264
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0 Memtable cell count: 141800
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0 Memtable data size: 31071290
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0 Memtable switch count: 41
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0 Local read count: 59
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 L= ocal read latency: 397.126 ms
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= Local write count: 155128
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 Lo= cal write latency: 0.368 ms
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 P= ending flushes: 0
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 Bloom filte= r false positives: 0
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 Bloom fi= lter false ratio: 0.00000
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 Blo= om filter space used: 2856
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 Co= mpacted partition minimum bytes: 104
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0 Compacted partition maximum bytes: 36904729268
=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0 Compacted partition mean bytes: 986530969
= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 Average live cells per slice (la= st five minutes): 501.66101694915255
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0 Maximum live cells per slice (last five minutes): 502.0
=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 Average tombstones per slice (last fiv= e minutes): 0.0
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 Maximum tombs= tones per slice (last five minutes): 0.0

Ouch! 400ms of read latency= , orders of magnitude higher than it has any right to be. How could this ha= ve happened? Is there something fundamentally broken about my data model? T= hanks!




<= font color=3D"#888888">--
Jens Rantil
<= div>Backend engineer
Tink AB

Email:=C2= =A0jens.rantil@tink.se
Phone: +46 708 84 18 32
Web:=C2=A0www.tink.se

=


--bcaec511e1165ffc120511f92e5e--