Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 3913811A59 for ; Sun, 11 May 2014 23:32:26 +0000 (UTC) Received: (qmail 11079 invoked by uid 500); 11 May 2014 20:45:43 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 11018 invoked by uid 500); 11 May 2014 20:45:42 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 11010 invoked by uid 99); 11 May 2014 20:45:42 -0000 Received: from Unknown (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 11 May 2014 20:45:42 +0000 X-ASF-Spam-Status: No, hits=-2.3 required=5.0 tests=RCVD_IN_DNSWL_MED,SPF_PASS,UNPARSEABLE_RELAY X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [85.158.143.249] (HELO mail1.bemta4.messagelabs.com) (85.158.143.249) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 11 May 2014 20:45:35 +0000 Received: from [85.158.143.35:20469] by server-2.bemta-4.messagelabs.com id 8A/C7-06539-851EF635; Sun, 11 May 2014 20:45:12 +0000 X-Env-Sender: devmail@petrolink.com X-Msg-Ref: server-7.tower-21.messagelabs.com!1399841111!4181516!1 X-Originating-IP: [216.52.184.152] X-StarScan-Received: X-StarScan-Version: 6.11.3; banners=-,-,- X-VirusChecked: Checked Received: (qmail 18453 invoked from network); 11 May 2014 20:45:11 -0000 Received: from us1logplot.petrolink.net (HELO sdcvmads2.petrolink.net) (216.52.184.152) by server-7.tower-21.messagelabs.com with SMTP; 11 May 2014 20:45:11 -0000 Received: from [192.168.25.17] ([37.131.65.118]) by sdcvmads2.petrolink.net (Lotus Domino Release 8.5.3FP6) with ESMTP id 2014051115450518-49389 ; Sun, 11 May 2014 15:45:05 -0500 From: Mark Farnan Subject: Question about READS in a multi DC environment. Message-Id: Date: Sun, 11 May 2014 23:44:57 +0300 To: user@cassandra.apache.org Mime-Version: 1.0 (Mac OS X Mail 7.2 \(1874\)) X-Mailer: Apple Mail (2.1874) X-MIMETrack: Itemize by SMTP Server on MADS_DAL2/PETROLINK(Release 8.5.3FP6|November 21, 2013) at 05/11/2014 03:45:05 PM, Serialize by Router on MADS_DAL2/PETROLINK(Release 8.5.3FP6|November 21, 2013) at 05/11/2014 03:45:11 PM, Serialize complete at 05/11/2014 03:45:11 PM Content-Type: multipart/mixed; boundary="Apple-Mail=_FE0CFA68-9BFB-4418-9062-82AF75FBEAFF" X-Virus-Checked: Checked by ClamAV on apache.org --Apple-Mail=_FE0CFA68-9BFB-4418-9062-82AF75FBEAFF Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=iso-8859-1 Im trying to understand READ load in Cassandra across a multi-datacenter = cluster. (Specifically why it seems to be hitting more than one DC) = and hope someone can help.=20 =46rom what I=EDm seeing here, a READ, with Consistency LOCAL_ONE, = seems to be hitting All 3 datacenters, rather than just the one I=EDm = connected to. I see 'Read 101 live and 0 tombstoned cells' from EACH = of the 3 DC"s in the trace, which seems, wrong. =20 I have tried every Consistency level, same result. This also is same = from my C# code via the DataStax driver, (where I first noticed the = issue).=20 Can someone please shed some light on what is occurring ? Specifically = I dont' want a query on one DC, going anywhere near the other 2 as a = rule, as in production, these DC's will be accross slower links.=20 Query: (NOTE: Whilst this uses a kairosdb table, i'm just playing = with queries against it as it has 100k columns in this key for testing).=20= cqlsh:kairosdb> consistency local_one Consistency level set to LOCAL_ONE. cqlsh:kairosdb> select * from data_points where key =3D = 0x6d61726c796e2e746573742e74656d70340000000145b514a400726f6f6d3d6f66666963= 653a limit 1000; ... Some return data rows listed here which I've removed .... --Apple-Mail=_FE0CFA68-9BFB-4418-9062-82AF75FBEAFF Content-Transfer-Encoding: 8bit Content-Type: text/plain; name="CassandraQuery.txt" Content-Disposition: attachment; filename=CassandraQuery.txt I�m trying to understand READ load in Cassandra across a multi-datacenter cluster. >From what I�m seeing here, a READ, with Consistency LOCAL_ONE, seems to be hitting All 3 datacenters, rather than just the one I�m connected to. I see 'Read 101 live and 0 tombstoned cells' from EACH of the 3 DC"s in the trace, which seems, wrong. I have tried every Consistency level, same result. This also is same from my C# code via the DataStax driver, (where I first noticed the issue). Can someone please shed some light on what is occurring ? Specifically I dont' want a query on one DC, going anywhere near the other 2 as a rule, as in production, these DC's will be accross slower links. Query: (NOTE: Whilst this uses a kairosdb table, i'm just playing with queries against it as it has 100k columns in this key for testing). cqlsh:kairosdb> consistency local_one Consistency level set to LOCAL_ONE. cqlsh:kairosdb> select * from data_points where key = 0x6d61726c796e2e746573742e74656d70340000000145b514a400726f6f6d3d6f66666963653a limit 1000; ... Some return data rows listed here which I've removed .... Query Respose Trace: activity | timestamp | source | source_elapsed ------------------------------------------------------------------------------------------------------------------------------------------+--------------+----------------+---------------- execute_cql3_query | 07:18:12,692 | 192.168.25.111 | 0 Message received from /192.168.25.111 | 07:18:00,706 | 192.168.25.131 | 50 Executing single-partition query on data_points | 07:18:00,707 | 192.168.25.131 | 760 Acquiring sstable references | 07:18:00,707 | 192.168.25.131 | 814 Merging memtable tombstones | 07:18:00,707 | 192.168.25.131 | 924 Bloom filter allows skipping sstable 191 | 07:18:00,707 | 192.168.25.131 | 1050 Bloom filter allows skipping sstable 190 | 07:18:00,707 | 192.168.25.131 | 1166 Key cache hit for sstable 189 | 07:18:00,707 | 192.168.25.131 | 1275 Seeking to partition beginning in data file | 07:18:00,707 | 192.168.25.131 | 1293 Skipped 0/3 non-slice-intersecting sstables, included 0 due to tombstones | 07:18:00,708 | 192.168.25.131 | 2173 Merging data from memtables and 1 sstables | 07:18:00,708 | 192.168.25.131 | 2195 Read 1001 live and 0 tombstoned cells | 07:18:00,709 | 192.168.25.131 | 3259 Enqueuing response to /192.168.25.111 | 07:18:00,710 | 192.168.25.131 | 4006 Sending message to /192.168.25.111 | 07:18:00,710 | 192.168.25.131 | 4210 Parsing select * from data_points where key = 0x6d61726c796e2e746573742e74656d70340000000145b514a400726f6f6d3d6f66666963653a limit 1000; | 07:18:12,692 | 192.168.25.111 | 52 Preparing statement | 07:18:12,692 | 192.168.25.111 | 257 Sending message to /192.168.25.121 | 07:18:12,693 | 192.168.25.111 | 1099 Sending message to /192.168.25.131 | 07:18:12,693 | 192.168.25.111 | 1254 Executing single-partition query on data_points | 07:18:12,693 | 192.168.25.111 | 1269 Acquiring sstable references | 07:18:12,693 | 192.168.25.111 | 1284 Merging memtable tombstones | 07:18:12,694 | 192.168.25.111 | 1315 Key cache hit for sstable 205 | 07:18:12,694 | 192.168.25.111 | 1592 Seeking to partition beginning in data file | 07:18:12,694 | 192.168.25.111 | 1606 Skipped 0/1 non-slice-intersecting sstables, included 0 due to tombstones | 07:18:12,695 | 192.168.25.111 | 2423 Merging data from memtables and 1 sstables | 07:18:12,695 | 192.168.25.111 | 2498 Read 1001 live and 0 tombstoned cells | 07:18:12,695 | 192.168.25.111 | 3167 Message received from /192.168.25.121 | 07:18:12,697 | 192.168.25.111 | null Processing response from /192.168.25.121 | 07:18:12,697 | 192.168.25.111 | null Message received from /192.168.25.131 | 07:18:12,699 | 192.168.25.111 | null Processing response from /192.168.25.131 | 07:18:12,699 | 192.168.25.111 | null Message received from /192.168.25.111 | 07:19:49,432 | 192.168.25.121 | 68 Executing single-partition query on data_points | 07:19:49,433 | 192.168.25.121 | 824 Acquiring sstable references | 07:19:49,433 | 192.168.25.121 | 840 Merging memtable tombstones | 07:19:49,433 | 192.168.25.121 | 898 Bloom filter allows skipping sstable 193 | 07:19:49,433 | 192.168.25.121 | 983 Key cache hit for sstable 192 | 07:19:49,433 | 192.168.25.121 | 1055 Seeking to partition beginning in data file | 07:19:49,433 | 192.168.25.121 | 1073 Skipped 0/2 non-slice-intersecting sstables, included 0 due to tombstones | 07:19:49,434 | 192.168.25.121 | 1803 Merging data from memtables and 1 sstables | 07:19:49,434 | 192.168.25.121 | 1839 Read 1001 live and 0 tombstoned cells | 07:19:49,434 | 192.168.25.121 | 2518 Enqueuing response to /192.168.25.111 | 07:19:49,435 | 192.168.25.121 | 3026 Sending message to /192.168.25.111 | 07:19:49,435 | 192.168.25.121 | 3128 Request complete | 07:18:12,696 | 192.168.25.111 | 4387 Other Stats about the cluster: [root@cdev101 conf]# nodetool status Datacenter: DC3 =============== Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns Host ID Rack UN 192.168.25.131 80.67 MB 256 34.2% 6ec61643-17d4-4a2e-8c44-57e08687a957 RAC1 Datacenter: DC2 =============== Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns Host ID Rack UN 192.168.25.121 79.46 MB 256 30.6% 976626fb-ea80-405b-abb0-eae703b0074d RAC1 Datacenter: DC1 =============== Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns Host ID Rack UN 192.168.25.111 61.82 MB 256 35.2% 9475e2da-d926-42d0-83fb-0188d0f8f438 RAC1 cqlsh> describe keyspace kairosdb CREATE KEYSPACE kairosdb WITH replication = { 'class': 'NetworkTopologyStrategy', 'DC2': '1', 'DC3': '1', 'DC1': '1' }; USE kairosdb; CREATE TABLE data_points ( key blob, column1 blob, value blob, PRIMARY KEY (key, column1) ) WITH COMPACT STORAGE AND bloom_filter_fp_chance=0.010000 AND caching='KEYS_ONLY' AND comment='' AND dclocal_read_repair_chance=0.000000 AND gc_grace_seconds=864000 AND index_interval=128 AND read_repair_chance=1.000000 AND replicate_on_write='true' AND populate_io_cache_on_flush='false' AND default_time_to_live=0 AND speculative_retry='NONE' AND memtable_flush_period_in_ms=0 AND compaction={'class': 'SizeTieredCompactionStrategy'} AND compression={'sstable_compression': 'LZ4Compressor'}; --Apple-Mail=_FE0CFA68-9BFB-4418-9062-82AF75FBEAFF Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=us-ascii Query Respose Trace:=20 activity = | = timestamp | source | source_elapsed = --------------------------------------------------------------------------= ----------------------------------------------------------------+---------= -----+----------------+---------------- = execute_cql3_query | = 07:18:12,692 | 192.168.25.111 | 0 = Message received from /192.168.25.111 | = 07:18:00,706 | 192.168.25.131 | 50 = Executing single-partition query on data_points | = 07:18:00,707 | 192.168.25.131 | 760 = Acquiring sstable references | = 07:18:00,707 | 192.168.25.131 | 814 = Merging memtable tombstones | = 07:18:00,707 | 192.168.25.131 | 924 = Bloom filter allows skipping sstable 191 | = 07:18:00,707 | 192.168.25.131 | 1050 = Bloom filter allows skipping sstable 190 | = 07:18:00,707 | 192.168.25.131 | 1166 = Key cache hit for sstable 189 | = 07:18:00,707 | 192.168.25.131 | 1275 = Seeking to partition beginning in data file | = 07:18:00,707 | 192.168.25.131 | 1293 Skipped = 0/3 non-slice-intersecting sstables, included 0 due to tombstones | = 07:18:00,708 | 192.168.25.131 | 2173 = Merging data from memtables and 1 sstables | = 07:18:00,708 | 192.168.25.131 | 2195 = Read 1001 live and 0 tombstoned cells | = 07:18:00,709 | 192.168.25.131 | 3259 = Enqueuing response to /192.168.25.111 | = 07:18:00,710 | 192.168.25.131 | 4006 = Sending message to /192.168.25.111 | = 07:18:00,710 | 192.168.25.131 | 4210 Parsing select * from data_points where key =3D = 0x6d61726c796e2e746573742e74656d70340000000145b514a400726f6f6d3d6f66666963= 653a limit 1000; | 07:18:12,692 | 192.168.25.111 | 52 = Preparing statement | = 07:18:12,692 | 192.168.25.111 | 257 = Sending message to /192.168.25.121 | = 07:18:12,693 | 192.168.25.111 | 1099 = Sending message to /192.168.25.131 | = 07:18:12,693 | 192.168.25.111 | 1254 = Executing single-partition query on data_points | = 07:18:12,693 | 192.168.25.111 | 1269 = Acquiring sstable references | = 07:18:12,693 | 192.168.25.111 | 1284 = Merging memtable tombstones | = 07:18:12,694 | 192.168.25.111 | 1315 = Key cache hit for sstable 205 | = 07:18:12,694 | 192.168.25.111 | 1592 = Seeking to partition beginning in data file | = 07:18:12,694 | 192.168.25.111 | 1606 Skipped = 0/1 non-slice-intersecting sstables, included 0 due to tombstones | = 07:18:12,695 | 192.168.25.111 | 2423 = Merging data from memtables and 1 sstables | = 07:18:12,695 | 192.168.25.111 | 2498 = Read 1001 live and 0 tombstoned cells | = 07:18:12,695 | 192.168.25.111 | 3167 = Message received from /192.168.25.121 | = 07:18:12,697 | 192.168.25.111 | null = Processing response from /192.168.25.121 | = 07:18:12,697 | 192.168.25.111 | null = Message received from /192.168.25.131 | = 07:18:12,699 | 192.168.25.111 | null = Processing response from /192.168.25.131 | = 07:18:12,699 | 192.168.25.111 | null = Message received from /192.168.25.111 | = 07:19:49,432 | 192.168.25.121 | 68 = Executing single-partition query on data_points | = 07:19:49,433 | 192.168.25.121 | 824 = Acquiring sstable references | = 07:19:49,433 | 192.168.25.121 | 840 = Merging memtable tombstones | = 07:19:49,433 | 192.168.25.121 | 898 = Bloom filter allows skipping sstable 193 | = 07:19:49,433 | 192.168.25.121 | 983 = Key cache hit for sstable 192 | = 07:19:49,433 | 192.168.25.121 | 1055 = Seeking to partition beginning in data file | = 07:19:49,433 | 192.168.25.121 | 1073 Skipped = 0/2 non-slice-intersecting sstables, included 0 due to tombstones | = 07:19:49,434 | 192.168.25.121 | 1803 = Merging data from memtables and 1 sstables | = 07:19:49,434 | 192.168.25.121 | 1839 = Read 1001 live and 0 tombstoned cells | = 07:19:49,434 | 192.168.25.121 | 2518 = Enqueuing response to /192.168.25.111 | = 07:19:49,435 | 192.168.25.121 | 3026 = Sending message to /192.168.25.111 | = 07:19:49,435 | 192.168.25.121 | 3128 = Request complete | = 07:18:12,696 | 192.168.25.111 | 4387 Other Stats about the cluster: [root@cdev101 conf]# nodetool status Datacenter: DC3 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D Status=3DUp/Down |/ State=3DNormal/Leaving/Joining/Moving -- Address Load Tokens Owns Host ID = Rack UN 192.168.25.131 80.67 MB 256 34.2% = 6ec61643-17d4-4a2e-8c44-57e08687a957 RAC1 Datacenter: DC2 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D Status=3DUp/Down |/ State=3DNormal/Leaving/Joining/Moving -- Address Load Tokens Owns Host ID = Rack UN 192.168.25.121 79.46 MB 256 30.6% = 976626fb-ea80-405b-abb0-eae703b0074d RAC1 Datacenter: DC1 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D Status=3DUp/Down |/ State=3DNormal/Leaving/Joining/Moving -- Address Load Tokens Owns Host ID = Rack UN 192.168.25.111 61.82 MB 256 35.2% = 9475e2da-d926-42d0-83fb-0188d0f8f438 RAC1 cqlsh> describe keyspace kairosdb CREATE KEYSPACE kairosdb WITH replication =3D { 'class': 'NetworkTopologyStrategy', 'DC2': '1', 'DC3': '1', 'DC1': '1' }; USE kairosdb; CREATE TABLE data_points ( key blob, column1 blob, value blob, PRIMARY KEY (key, column1) ) WITH COMPACT STORAGE AND bloom_filter_fp_chance=3D0.010000 AND caching=3D'KEYS_ONLY' AND comment=3D'' AND dclocal_read_repair_chance=3D0.000000 AND gc_grace_seconds=3D864000 AND index_interval=3D128 AND read_repair_chance=3D1.000000 AND replicate_on_write=3D'true' AND populate_io_cache_on_flush=3D'false' AND default_time_to_live=3D0 AND speculative_retry=3D'NONE' AND memtable_flush_period_in_ms=3D0 AND compaction=3D{'class': 'SizeTieredCompactionStrategy'} AND compression=3D{'sstable_compression': 'LZ4Compressor'}; --Apple-Mail=_FE0CFA68-9BFB-4418-9062-82AF75FBEAFF--