From user-return-31602-apmail-cassandra-user-archive=cassandra.apache.org@cassandra.apache.org Sat Feb 2 22:01:17 2013 Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 5FB99EF01 for ; Sat, 2 Feb 2013 22:01:17 +0000 (UTC) Received: (qmail 802 invoked by uid 500); 2 Feb 2013 22:01:14 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 780 invoked by uid 500); 2 Feb 2013 22:01:14 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 772 invoked by uid 99); 2 Feb 2013 22:01:14 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 02 Feb 2013 22:01:14 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [208.113.200.5] (HELO homiemail-a58.g.dreamhost.com) (208.113.200.5) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 02 Feb 2013 22:01:08 +0000 Received: from homiemail-a58.g.dreamhost.com (localhost [127.0.0.1]) by homiemail-a58.g.dreamhost.com (Postfix) with ESMTP id 8C12B7D8060 for ; Sat, 2 Feb 2013 14:00:55 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=thelastpickle.com; h=from :content-type:message-id:mime-version:subject:date:references:to :in-reply-to; s=thelastpickle.com; bh=et6psMNC8PmSYkSOr4mQhUbOoN Q=; b=ypWCcGVk2/1WHdWstB6EnZdqZ0uwJnlSa+43J5na7MXsCZywy5qUJl7HNA n36eHEgbNzCDr6K3uHCCHbMZUBKH1be44KQo3PtYFQsv0H+TSMJdaLKB3AeaKtqQ Dj53swKnXRHbnVjJLIIKXKiAwRV9+cJbKpgkoBb+ualQP11oA= Received: from [172.16.1.8] (unknown [203.86.207.101]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: aaron@thelastpickle.com) by homiemail-a58.g.dreamhost.com (Postfix) with ESMTPSA id B98207D805B for ; Sat, 2 Feb 2013 14:00:54 -0800 (PST) From: aaron morton Content-Type: multipart/alternative; boundary="Apple-Mail=_B4F80069-6FEC-4D31-9061-CE62562D4903" Message-Id: <829D50B3-01D7-4F15-B07C-ADB275ADE967@thelastpickle.com> Mime-Version: 1.0 (Mac OS X Mail 6.2 \(1499\)) Subject: Re: Secondary index query + 2 Datacenters + Row Cache + Restart = 0 rows Date: Sun, 3 Feb 2013 11:00:44 +1300 References: To: user@cassandra.apache.org In-Reply-To: X-Mailer: Apple Mail (2.1499) X-Virus-Checked: Checked by ClamAV on apache.org --Apple-Mail=_B4F80069-6FEC-4D31-9061-CE62562D4903 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=iso-8859-1 Can you run the select in cqlsh and enabling tracing (see the cqlsh = online help).=20 If you can replicate it then place raise a ticket on = https://issues.apache.org/jira/browse/CASSANDRA and update email thread.=20= Thanks ----------------- Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 1/02/2013, at 9:03 PM, Alexei Bakanov wrote: > Hello, >=20 > I've found a combination that doesn't work: > A column family that have a secondary index and caching=3D'ALL' with > data in two datacenters and I do a restart of the nodes, then my > secondary index queries start returning 0 rows. > It happens when amount of data goes over a certain threshold, so I > suspect that compactions are involved in this as well. > Taking out one of the ingredients fixes the problem and my queries > return rows from secondary index. > I suspect that this guy is struggling with the same thing > https://issues.apache.org/jira/browse/CASSANDRA-4785 >=20 > Here is a sequence of actions that reproduces it with help of CCM: >=20 > $ ccm create --cassandra-version 1.2.1 --nodes 2 -p RandomPartitioner > testRowCacheDC > $ ccm updateconf 'endpoint_snitch: PropertyFileSnitch' > $ ccm updateconf 'row_cache_size_in_mb: 200' > $ cp ~/Downloads/cassandra-topology.properties > ~/.ccm/testRowCacheDC/node1/conf/ (please find .properties file > below) > $ cp ~/Downloads/cassandra-topology.properties = ~/.ccm/testRowCacheDC/node2/conf/ > $ ccm start > $ ccm cli > ->create keyspace and column family(please find schema below) > $ python populate_rowcache.py > $ ccm stop (I tried flush first, doesn't help) > $ ccm start > $ ccm cli > Connected to: "testRowCacheDC" on 127.0.0.1/9160 > Welcome to Cassandra CLI version 1.2.1-SNAPSHOT >=20 > Type 'help;' or '?' for help. > Type 'quit;' or 'exit;' to quit. >=20 > [default@unknown] use testks; > Authenticated to keyspace: testks > [default@testks] get cf1 where 'indexedColumn'=3D'userId_75'; >=20 > 0 Row Returned. > Elapsed time: 68 msec(s). >=20 > My cassandra instances run with -Xms1927M -Xmx1927M -Xmn400M > Thanks for help. >=20 > Best regards, > Alexei >=20 >=20 > ------ START cassandra-topology.properties ---------- > 127.0.0.1=3DDC1:RAC1 > 127.0.0.2=3DDC2:RAC1 > default=3DDC1:r1 > ------ FINISH cassandra-topology.properties ---------- >=20 > ------ START cassandra-cli schema ----------- > create keyspace testks > with placement_strategy =3D 'NetworkTopologyStrategy' > and strategy_options =3D {DC2 : 1, DC1 : 1} > and durable_writes =3D true; >=20 > use testks; >=20 > create column family cf1 > with column_type =3D 'Standard' > and comparator =3D 'org.apache.cassandra.db.marshal.AsciiType' > and default_validation_class =3D 'UTF8Type' > and key_validation_class =3D 'UTF8Type' > and read_repair_chance =3D 1.0 > and dclocal_read_repair_chance =3D 0.0 > and gc_grace =3D 864000 > and min_compaction_threshold =3D 4 > and max_compaction_threshold =3D 32 > and replicate_on_write =3D true > and compaction_strategy =3D > 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy' > and caching =3D 'ALL' > and column_metadata =3D [ > {column_name : 'indexedColumn', > validation_class : UTF8Type, > index_name : 'INDEX1', > index_type : 0}] > and compression_options =3D {'sstable_compression' : > 'org.apache.cassandra.io.compress.SnappyCompressor'}; > -------FINISH cassandra-cli schema ----------- >=20 > ------ START populate_rowcache.py ----------- > from pycassa.batch import Mutator >=20 > import pycassa >=20 > pool =3D pycassa.ConnectionPool('testks', timeout=3D5) > cf =3D pycassa.ColumnFamily(pool, 'cf1') >=20 > for userId in xrange(0, 1000): > print userId > b =3D Mutator(pool, queue_size=3D200) > for itemId in xrange(20): > rowKey =3D 'userId_%s:itemId_%s'%(userId, itemId) > for message_number in xrange(10): > b.insert(cf, rowKey, {'indexedColumn': 'userId_%s'%userId, > str(message_number): str(message_number)}) > b.send() >=20 > pool.dispose() > ------ FINISH populate_rowcache.py ----------- --Apple-Mail=_B4F80069-6FEC-4D31-9061-CE62562D4903 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=iso-8859-1 https://issues.ap= ache.org/jira/browse/CASSANDRA and update email = thread. 

Thanks

http://www.thelastpickle.com

On 1/02/2013, at 9:03 PM, Alexei Bakanov <russisk@gmail.com> = wrote:

Hello,

I've found a combination that doesn't = work:
A column family that have a secondary index and caching=3D'ALL' = with
data in two datacenters and I do a restart of the nodes, then = my
secondary index queries start returning 0 rows.
It happens when = amount of data goes over a certain threshold, so I
suspect that = compactions are involved in this as well.
Taking out one of the = ingredients fixes the problem and my queries
return rows from = secondary index.
I suspect that this guy is struggling with the same = thing
https://issu= es.apache.org/jira/browse/CASSANDRA-4785

Here is a sequence = of actions that reproduces it with help of CCM:

$ ccm create = --cassandra-version 1.2.1 --nodes 2 -p = RandomPartitioner
testRowCacheDC
$ ccm updateconf = 'endpoint_snitch: PropertyFileSnitch'
$ ccm updateconf = 'row_cache_size_in_mb: 200'
$ cp = ~/Downloads/cassandra-topology.properties
~/.ccm/testRowCacheDC/node1/c= onf/      (please find .properties = file
below)
$ cp ~/Downloads/cassandra-topology.properties = ~/.ccm/testRowCacheDC/node2/conf/
$ ccm start
$ ccm cli
= ->create keyspace and column family(please find schema below)
$ = python populate_rowcache.py
$ ccm stop  (I tried flush first, = doesn't help)
$ ccm start
$ ccm cli
Connected to: = "testRowCacheDC" on 127.0.0.1/9160
Welcome to Cassandra CLI version = 1.2.1-SNAPSHOT

Type 'help;' or '?' for help.
Type 'quit;' or = 'exit;' to quit.

[default@unknown] use testks;
Authenticated = to keyspace: testks
[default@testks] get cf1 where = 'indexedColumn'=3D'userId_75';

0 Row Returned.
Elapsed time: = 68 msec(s).

My cassandra instances run with -Xms1927M -Xmx1927M = -Xmn400M
Thanks for help.

Best = regards,
Alexei


------ START cassandra-topology.properties = ----------
127.0.0.1=3DDC1:RAC1
127.0.0.2=3DDC2:RAC1
default=3DDC= 1:r1
------ FINISH cassandra-topology.properties = ----------

------ START cassandra-cli schema = -----------
create keyspace testks
 with placement_strategy = =3D 'NetworkTopologyStrategy'
 and strategy_options =3D {DC2 : = 1, DC1 : 1}
 and durable_writes =3D true;

use = testks;

create column family cf1
 with column_type =3D = 'Standard'
 and comparator =3D = 'org.apache.cassandra.db.marshal.AsciiType'
 and = default_validation_class =3D 'UTF8Type'
 and = key_validation_class =3D 'UTF8Type'
 and read_repair_chance =3D = 1.0
 and dclocal_read_repair_chance =3D 0.0
 and = gc_grace =3D 864000
 and min_compaction_threshold =3D 4
=  and max_compaction_threshold =3D 32
 and = replicate_on_write =3D true
 and compaction_strategy = =3D
'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'  and caching =3D 'ALL'
 and column_metadata =3D [
=    {column_name : 'indexedColumn',
=    validation_class : UTF8Type,
=    index_name : 'INDEX1',
=    index_type : 0}]
 and compression_options =3D = {'sstable_compression' = :
'org.apache.cassandra.io.compress.SnappyCompressor'};
-------FINIS= H cassandra-cli schema -----------

------ START = populate_rowcache.py -----------
from pycassa.batch import = Mutator

import pycassa

pool =3D = pycassa.ConnectionPool('testks', timeout=3D5)
cf =3D = pycassa.ColumnFamily(pool, 'cf1')

for userId in xrange(0, = 1000):
   print userId
   b =3D = Mutator(pool, queue_size=3D200)
   for itemId in = xrange(20):
       rowKey =3D = 'userId_%s:itemId_%s'%(userId, itemId)
=        for message_number in = xrange(10):
=            b.insert= (cf, rowKey, {'indexedColumn': = 'userId_%s'%userId,
str(message_number): str(message_number)})
=    b.send()

pool.dispose()
------ FINISH = populate_rowcache.py = -----------

= --Apple-Mail=_B4F80069-6FEC-4D31-9061-CE62562D4903--