Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 78814D188 for ; Wed, 19 Dec 2012 21:33:49 +0000 (UTC) Received: (qmail 41064 invoked by uid 500); 19 Dec 2012 21:33:46 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 41036 invoked by uid 500); 19 Dec 2012 21:33:46 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 40977 invoked by uid 99); 19 Dec 2012 21:33:46 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 19 Dec 2012 21:33:46 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [208.113.200.5] (HELO homiemail-a56.g.dreamhost.com) (208.113.200.5) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 19 Dec 2012 21:33:41 +0000 Received: from homiemail-a56.g.dreamhost.com (localhost [127.0.0.1]) by homiemail-a56.g.dreamhost.com (Postfix) with ESMTP id 857C6FE064 for ; Wed, 19 Dec 2012 13:32:51 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=thelastpickle.com; h=from :content-type:message-id:mime-version:subject:date:references:to :in-reply-to; s=thelastpickle.com; bh=r/F17DbPveua1+4OJNJMUiLXjT E=; b=tbtf8+GWTTcCFIxo+H5Itat/7vkoWjvrNFxXhaWi9TIbiA30SG4ej7ScF5 x2Kfn4D3Ssjb/OamrVQY4CsqtpiOILjKLPTKkt2r5sx5cj+M7szmmMEtaIl1GlBD Wm4vZk8gvb6z4G3Huy2a0097p65Ej84/68Qoil+RNf+B07PN8= Received: from [192.168.2.13] (unknown [116.90.132.105]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: aaron@thelastpickle.com) by homiemail-a56.g.dreamhost.com (Postfix) with ESMTPSA id BDB1AFE05B for ; Wed, 19 Dec 2012 13:32:50 -0800 (PST) From: aaron morton Content-Type: multipart/alternative; boundary="Apple-Mail=_AA7E0D45-4679-49D8-A45A-1E39D4B99486" Message-Id: Mime-Version: 1.0 (Mac OS X Mail 6.2 \(1499\)) Subject: Re: TTL on SecondaryIndex Columns. A bug? Date: Thu, 20 Dec 2012 10:33:20 +1300 References: <69989DC961D0DB4D805CA94CF4607371150DEB9E@AMSPRD0710MB365.eurprd07.prod.outlook.com> To: user@cassandra.apache.org In-Reply-To: <69989DC961D0DB4D805CA94CF4607371150DEB9E@AMSPRD0710MB365.eurprd07.prod.outlook.com> X-Mailer: Apple Mail (2.1499) X-Virus-Checked: Checked by ClamAV on apache.org --Apple-Mail=_AA7E0D45-4679-49D8-A45A-1E39D4B99486 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=windows-1252 > Please try to run Cassandra with -Xms1927M -Xmx1927M -Xmn400M Done and I now get your repo case=85 [default@ks123] get cf1 where 'indexedColumn'=3D'65'; 0 Row Returned. Elapsed time: 1.44 msec(s). [default@ks123] get cf1 where 'indexedColumn'=3D'66'; ------------------- RowKey: 66 =3D> (column=3D1, value=3Dval, timestamp=3D1355952222439049, = ttl=3D7884000) =3D> (column=3D10, value=3Dval, timestamp=3D1355952222439269, = ttl=3D7884000) ... =3D> (column=3DindexedColumn, value=3D66, timestamp=3D1355952223881937, = ttl=3D7887600) Looking into it now.=20 Thanks ----------------- Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 19/12/2012, at 9:56 PM, Roland Gude wrote: > I think this might be = https://issues.apache.org/jira/browse/CASSANDRA-4670 > Unfortunately apart from me no one was yet able to reproduce. >=20 > Check if data is available before/after compaction > If you have leveled compaction it is hard to test because you cannot = trigger compaction manually. >=20 > -----Urspr=FCngliche Nachricht----- > Von: Alexei Bakanov [mailto:russisk@gmail.com]=20 > Gesendet: Mittwoch, 19. Dezember 2012 09:35 > An: user@cassandra.apache.org > Betreff: Re: TTL on SecondaryIndex Columns. A bug? >=20 > I'm running on a single node on my laptop. > It looks like the point when rows dissapear from the index depends on = JVM memory settings. With more memory it needs more data to feed in = before things start disappearing. > Please try to run Cassandra with -Xms1927M -Xmx1927M -Xmn400M >=20 > To be sure, try to get rows for 'indexedColumn'=3D'1': >=20 > [default@ks123] get cf1 where 'indexedColumn'=3D'1'; >=20 > 0 Row Returned. >=20 > Thanks >=20 >=20 > On 19 December 2012 05:15, aaron morton = wrote: >> Thanks for the nice steps to reproduce. >>=20 >> I ran this on my MBP using C* 1.1.7 and got the expected results, = both=20 >> get's returned a row. >>=20 >> Were you running against a single node or a cluster ? If a cluster = did=20 >> you change the CL, cassandra-cli defaults to ONE. >>=20 >> Cheers >>=20 >> ----------------- >> Aaron Morton >> Freelance Cassandra Developer >> New Zealand >>=20 >> @aaronmorton >> http://www.thelastpickle.com >>=20 >> On 18/12/2012, at 9:44 PM, Alexei Bakanov wrote: >>=20 >> Hi, >>=20 >> We are having an issue with TTL on Secondary index columns. We get 0=20= >> rows in return when running queries on indexed columns that have TTL. >> Everything works fine with small amounts of data, but when we get = over=20 >> a ceratin threshold it looks like older rows dissapear from the = index. >> In the example below we create 70 rows with 45k columns each + one=20 >> indexed column with just the rowkey as value, so we have one row per=20= >> indexed value. When the script is finished the index contains rows=20 >> 66-69. Rows 0-65 are gone from the index. >> Using 'indexedColumn' without TTL fixes the problem. >>=20 >>=20 >> ------------- SCHEMA START ----------------- create keyspace ks123 =20= >> with placement_strategy =3D 'NetworkTopologyStrategy' >> and strategy_options =3D {datacenter1 : 1} and durable_writes =3D = true; >>=20 >> use ks123; >>=20 >> create column family cf1 >> with column_type =3D 'Standard' >> and comparator =3D 'AsciiType' >> and default_validation_class =3D 'AsciiType' >> and key_validation_class =3D 'AsciiType' >> and read_repair_chance =3D 0.1 >> and dclocal_read_repair_chance =3D 0.0 >> and gc_grace =3D 864000 >> and min_compaction_threshold =3D 4 >> and max_compaction_threshold =3D 32 >> and replicate_on_write =3D true >> and compaction_strategy =3D >> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy' >> and caching =3D 'KEYS_ONLY' >> and column_metadata =3D [ >> {column_name : 'indexedColumn', >> validation_class : AsciiType, >> index_name : 'INDEX1', >> index_type : 0}] >> and compression_options =3D {'sstable_compression' : >> 'org.apache.cassandra.io.compress.SnappyCompressor'}; >> ------------- SCHEMA FINISH ----------------- >>=20 >> ------------- POPULATE START ----------------- from pycassa.batch=20 >> import Mutator import pycassa >>=20 >> pool =3D pycassa.ConnectionPool('ks123') cf =3D = pycassa.ColumnFamily(pool,=20 >> 'cf1') >>=20 >> for rowKey in xrange(70): >> b =3D Mutator(pool) >> for datapoint in xrange(1, 45001): >> b.insert(cf,str(rowKey), {str(datapoint): 'val'}, ttl=3D7884000);= >> b.insert(cf, str(rowKey), {'indexedColumn': str(rowKey)}, = ttl=3D7887600); >> print 'row %d' % rowKey >> b.send() >> b =3D Mutator(pool) >>=20 >> pool.dispose() >> ------------- POPULATE FINISH ----------------- >>=20 >> ------------- QUERY START ----------------- [default@ks123] get cf1=20= >> where 'indexedColumn'=3D'65'; >>=20 >> 0 Row Returned. >> Elapsed time: 2.38 msec(s). >>=20 >> [default@ks123] get cf1 where 'indexedColumn'=3D'66'; >> ------------------- >> RowKey: 66 >> =3D> (column=3D1, value=3Dval, timestamp=3D1355818765548964, = ttl=3D7884000) ... >> =3D> (column=3D10087, value=3Dval, timestamp=3D1355818766075538, = ttl=3D7884000)=20 >> =3D> (column=3DindexedColumn, value=3D66, timestamp=3D1355818768119334,= =20 >> ttl=3D7887600) >>=20 >> 1 Row Returned. >> Elapsed time: 31 msec(s). >> ------------- QUERY FINISH ----------------- >>=20 >> This is all using Cassandra 1.1.7 with default settings. >>=20 >> Best regards, >>=20 >> Alexei Bakanov >>=20 >>=20 >=20 >=20 --Apple-Mail=_AA7E0D45-4679-49D8-A45A-1E39D4B99486 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=windows-1252 Please try to run Cassandra with -Xms1927M = -Xmx1927M -Xmn400M
Done and I now get your repo = case=85

[default@ks123] get cf1 where = 'indexedColumn'=3D'65';

0 Row = Returned.
Elapsed time: 1.44 = msec(s).


[default@ks123] = get cf1 where = 'indexedColumn'=3D'66';
-------------------
RowKey: = 66
=3D> (column=3D1, value=3Dval, = timestamp=3D1355952222439049, ttl=3D7884000)
=3D> = (column=3D10, value=3Dval, timestamp=3D1355952222439269, = ttl=3D7884000)
...
=3D> = (column=3DindexedColumn, value=3D66, timestamp=3D1355952223881937, = ttl=3D7887600)

Looking into it = now. 

Thanks

http://www.thelastpickle.com

On 19/12/2012, at 9:56 PM, Roland Gude <roland.gude@ez.no> = wrote:

I think this might be https://issu= es.apache.org/jira/browse/CASSANDRA-4670
Unfortunately apart from = me no one was yet able to reproduce.

Check if data is available = before/after compaction
If you have leveled compaction it is hard to = test because you cannot trigger compaction = manually.

-----Urspr=FCngliche Nachricht-----
Von: Alexei = Bakanov [mailto:russisk@gmail.com] =
Gesendet: Mittwoch, 19. Dezember 2012 09:35
An: user@cassandra.apache.orgBetreff: Re: TTL on SecondaryIndex Columns. A bug?

I'm running = on a single node on my laptop.
It looks like the point when rows = dissapear from the index depends on JVM memory settings. With more = memory it needs more data to feed in before things start = disappearing.
Please try to run Cassandra with -Xms1927M -Xmx1927M = -Xmn400M

To be sure, try to get rows for = 'indexedColumn'=3D'1':

[default@ks123] get cf1 where = 'indexedColumn'=3D'1';

0 Row = Returned.

Thanks


On 19 December 2012 05:15, aaron = morton <aaron@thelastpickle.com> = wrote:
Thanks for the nice steps to = reproduce.

I ran this on my MBP using C* 1.1.7 and got the = expected results, both
get's returned a row.

Were you running = against a single node or a cluster ? If a cluster did
you change the = CL, cassandra-cli defaults to = ONE.

Cheers

-----------------
Aaron Morton
Freelance = Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com
=
On 18/12/2012, at 9:44 PM, Alexei Bakanov <russisk@gmail.com> = wrote:

Hi,

We are having an issue with TTL on Secondary = index columns. We get 0
rows in return when running queries on = indexed columns that have TTL.
Everything works fine with small = amounts of data, but when we get over
a ceratin threshold it looks = like older rows dissapear from the index.
In the example below we = create 70 rows with 45k columns each + one
indexed column with just = the rowkey as value, so we have one row per
indexed value. When the = script is finished the index contains rows
66-69. Rows 0-65 are gone = from the index.
Using 'indexedColumn' without TTL fixes the = problem.


------------- SCHEMA START ----------------- create = keyspace ks123  
with placement_strategy =3D = 'NetworkTopologyStrategy'
and strategy_options =3D {datacenter1 : 1} =  and durable_writes =3D true;

use ks123;

create = column family cf1
with column_type =3D 'Standard'
and comparator = =3D 'AsciiType'
and default_validation_class =3D 'AsciiType'
and = key_validation_class =3D 'AsciiType'
and read_repair_chance =3D = 0.1
and dclocal_read_repair_chance =3D 0.0
and gc_grace =3D = 864000
and min_compaction_threshold =3D 4
and = max_compaction_threshold =3D 32
and replicate_on_write =3D true
= and compaction_strategy = =3D
'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy' and caching =3D 'KEYS_ONLY'
and column_metadata =3D [
=   {column_name : 'indexedColumn',
=   validation_class : AsciiType,
  index_name : = 'INDEX1',
  index_type : 0}]
and compression_options =3D= {'sstable_compression' = :
'org.apache.cassandra.io.compress.SnappyCompressor'};
------------= - SCHEMA FINISH -----------------

------------- POPULATE START = ----------------- from pycassa.batch
import Mutator import = pycassa

pool =3D pycassa.ConnectionPool('ks123') cf =3D = pycassa.ColumnFamily(pool,
'cf1')

for rowKey in = xrange(70):
  b =3D Mutator(pool)
  for = datapoint in xrange(1, 45001):
=       b.insert(cf,str(rowKey), = {str(datapoint): 'val'}, ttl=3D7884000);
  b.insert(cf, = str(rowKey), {'indexedColumn': str(rowKey)}, ttl=3D7887600);
=   print 'row %d' % rowKey
  b.send()
=   b =3D Mutator(pool)

pool.dispose()
------------- = POPULATE FINISH -----------------

------------- QUERY START = ----------------- [default@ks123] get cf1
where = 'indexedColumn'=3D'65';

0 Row Returned.
Elapsed time: 2.38 = msec(s).

[default@ks123] get cf1 where = 'indexedColumn'=3D'66';
-------------------
RowKey: 66
=3D> = (column=3D1, value=3Dval, timestamp=3D1355818765548964, ttl=3D7884000) = ...
=3D> (column=3D10087, value=3Dval, timestamp=3D1355818766075538,= ttl=3D7884000)
=3D> (column=3DindexedColumn, value=3D66, = timestamp=3D1355818768119334,
ttl=3D7887600)

1 Row = Returned.
Elapsed time: 31 msec(s).
------------- QUERY FINISH = -----------------

This is all using Cassandra 1.1.7 with default = settings.

Best regards,

Alexei = Bakanov





= --Apple-Mail=_AA7E0D45-4679-49D8-A45A-1E39D4B99486--