Return-Path: X-Original-To: apmail-cassandra-commits-archive@www.apache.org Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id BCA3B18D89 for ; Fri, 9 Oct 2015 22:40:05 +0000 (UTC) Received: (qmail 6150 invoked by uid 500); 9 Oct 2015 22:40:05 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 6113 invoked by uid 500); 9 Oct 2015 22:40:05 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 6102 invoked by uid 99); 9 Oct 2015 22:40:05 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 09 Oct 2015 22:40:05 +0000 Date: Fri, 9 Oct 2015 22:40:05 +0000 (UTC) From: "Tyler Hobbs (JIRA)" To: commits@cassandra.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (CASSANDRA-10502) Cassandra query degradation with high frequency updated tables MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CASSANDRA-10502?page=3Dcom.atla= ssian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId= =3D14951323#comment-14951323 ]=20 Tyler Hobbs commented on CASSANDRA-10502: ----------------------------------------- Can you edit your comment and put "\{noformat\}" before and after each trac= e? That will help to make the text readable. > Cassandra query degradation with high frequency updated tables > -------------------------------------------------------------- > > Key: CASSANDRA-10502 > URL: https://issues.apache.org/jira/browse/CASSANDRA-1050= 2 > Project: Cassandra > Issue Type: Bug > Components: Core > Reporter: Dodong Juan > > Hi, > So we are developing a system that computes profile of things that it obs= erves. The observation comes in form of events. Each thing that it observe= has an id and each thing has a set of subthings in it which has measuremen= t of some kind. Roughly there are about 500 subthings within each thing. We= receive events containing measurements of these 500 subthings every 10 sec= onds or so. > So as we receive events, we read the old profile value, calculate the ne= w profile based on the new value and save it back.=20 > One of the things we observe are the processes running on the server. > We use the following schema to hold the profile.=20 > {noformat} > CREATE TABLE processinfometric_profile ( > profilecontext text, > id text, > month text, > day text, > hour text, > minute text, > command text, > cpu map, > majorfaults map, > minorfaults map, > nice map, > pagefaults map, > pid map, > ppid map, > priority map, > resident map, > rss map, > sharesize map, > size map, > starttime map, > state map, > threads map, > user map, > vsize map, > PRIMARY KEY ((profilecontext, agentid, month, day, hour, minute), com= mand) > ) WITH CLUSTERING ORDER BY (command ASC) > AND bloom_filter_fp_chance =3D 0.1 > AND caching =3D '{"keys":"ALL", "rows_per_partition":"NONE"}' > AND comment =3D '' > AND compaction =3D {'class': 'org.apache.cassandra.db.compaction.Leve= ledCompactionStrategy'} > AND compression =3D {'sstable_compression': 'org.apache.cassandra.io.= compress.LZ4Compressor'} > AND dclocal_read_repair_chance =3D 0.1 > AND default_time_to_live =3D 0 > AND gc_grace_seconds =3D 864000 > AND max_index_interval =3D 2048 > AND memtable_flush_period_in_ms =3D 0 > AND min_index_interval =3D 128 > AND read_repair_chance =3D 0.0 > AND speculative_retry =3D '99.0PERCENTILE'; > {noformat} > This profile will then be use for certain analytics that can use in the c= ontext of the =E2=80=98thing=E2=80=99 or in the context of specific thing a= nd subthing.=20 > A profile can be defined as monthly, daily, hourly. So in case of monthly= the month will be set to the current month (i.e. =E2=80=98Oct=E2=80=99) an= d the day and hour will be set to empty =E2=80=98=E2=80=99 string. > The problem that we have observed is that over time (actually in just a m= atter of hours) we will see a huge degradation of query response for the m= onthly profile. At the start it will be respinding in 10-100 ms and after a= couple of hours it will go to 2000-3000 ms . If you leave it for a couple = of days you will start experiencing readtimeouts . The query is basically j= ust : > {noformat} > select * from myprofile where id=3D=E2=80=981=E2=80=99 and month=3D=E2=80= =98Oct=E2=80=99 and day=3D=E2=80=98=E2=80=99 and hour=3D=E2=80=98' and minu= te=3D'' > {noformat} > This will have only about 500 rows or so. > We were using Cassandra 2.2.1 , but upgraded to 2.2.2 to see if it fixed = the issue to no avail. And since this is a test, we are running on a single= node. -- This message was sent by Atlassian JIRA (v6.3.4#6332)