Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 43E92995B for ; Fri, 21 Dec 2012 15:05:26 +0000 (UTC) Received: (qmail 8260 invoked by uid 500); 21 Dec 2012 15:05:23 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 8096 invoked by uid 500); 21 Dec 2012 15:05:23 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 8072 invoked by uid 99); 21 Dec 2012 15:05:23 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 21 Dec 2012 15:05:23 +0000 X-ASF-Spam-Status: No, hits=3.2 required=5.0 tests=FREEMAIL_REPLY,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS,UNPARSEABLE_RELAY X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of cscetbon.ext@orange.com designates 193.251.215.92 as permitted sender) Received: from [193.251.215.92] (HELO relais-inet.francetelecom.com) (193.251.215.92) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 21 Dec 2012 15:05:15 +0000 Received: from omfedm06.si.francetelecom.fr (unknown [xx.xx.xx.2]) by omfedm11.si.francetelecom.fr (ESMTP service) with ESMTP id 70B833B4B1C for ; Fri, 21 Dec 2012 16:04:55 +0100 (CET) Received: from PUEXCH51.nanterre.francetelecom.fr (unknown [10.101.44.31]) by omfedm06.si.francetelecom.fr (ESMTP service) with ESMTP id 542B027C090 for ; Fri, 21 Dec 2012 16:04:55 +0100 (CET) Received: from PUEXCB2E.nanterre.francetelecom.fr ([10.64.14.45]) by PUEXCH51.nanterre.francetelecom.fr ([10.101.44.31]) with mapi; Fri, 21 Dec 2012 16:04:54 +0100 From: To: "user@cassandra.apache.org" Date: Fri, 21 Dec 2012 16:05:08 +0100 Subject: Re: TTL on SecondaryIndex Columns. A bug? Thread-Topic: TTL on SecondaryIndex Columns. A bug? Thread-Index: Ac3fjIiqrot+GM3WQAeI56JsQTn59g== Message-ID: <22870_1356102295_50D47A97_22870_481_1_40AAA588-5A02-42F2-AD76-B62E9BD163D8@orange.com> References: <69989DC961D0DB4D805CA94CF4607371150DEB9E@AMSPRD0710MB365.eurprd07.prod.outlook.com> <0E5F65E6-0992-4ACD-92BD-E959C627756A@thelastpickle.com> In-Reply-To: <0E5F65E6-0992-4ACD-92BD-E959C627756A@thelastpickle.com> Accept-Language: fr-FR Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: fr-FR Content-Type: multipart/alternative; boundary="_000_40AAA5885A0242F2AD76B62E9BD163D8orangecom_" MIME-Version: 1.0 X-PMX-Version: 5.6.1.2065439, Antispam-Engine: 2.7.2.376379, Antispam-Data: 2012.10.24.110314 X-Virus-Checked: Checked by ClamAV on apache.org --_000_40AAA5885A0242F2AD76B62E9BD163D8orangecom_ Content-Type: text/plain; charset="Windows-1252" Content-Transfer-Encoding: quoted-printable Nice job Aaron, AFAIU now you set the gc_before to the current time for secondary indexes. = And as it was set to Integer.MAX_VALUE before your patch, removeDeletedStan= dard function was testing if (column.getLocalDeletiontime() < MAX_VALUE) wh= ich is always true and so was removing all rows from the secondary index. A= m I right ? -- Cyril SCETBON On Dec 20, 2012, at 9:28 PM, aaron morton > wrote: Yes, but they will get compacted away again unless the patch is there. it's a small patch so you should be able to apply it easily enough if you n= eed a fix ASAP. Cheers ----------------- Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 20/12/2012, at 5:27 PM, B. Todd Burruss > wrote: i believe we have hit this as well. if you use nodetool to rebuild_index, does it work? On Wed, Dec 19, 2012 at 8:10 PM, aaron morton > wrote: Well that was fun https://issues.apache.org/jira/browse/CASSANDRA-5079 Just testing my idea of a fix now. Cheers ----------------- Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 20/12/2012, at 10:33 AM, aaron morton > wrote: Please try to run Cassandra with -Xms1927M -Xmx1927M -Xmn400M Done and I now get your repo case=85 [default@ks123] get cf1 where 'indexedColumn'=3D'65'; 0 Row Returned. Elapsed time: 1.44 msec(s). [default@ks123] get cf1 where 'indexedColumn'=3D'66'; ------------------- RowKey: 66 =3D> (column=3D1, value=3Dval, timestamp=3D1355952222439049, ttl=3D7884000) =3D> (column=3D10, value=3Dval, timestamp=3D1355952222439269, ttl=3D7884000) ... =3D> (column=3DindexedColumn, value=3D66, timestamp=3D1355952223881937, ttl= =3D7887600) Looking into it now. Thanks ----------------- Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 19/12/2012, at 9:56 PM, Roland Gude wrote: I think this might be https://issues.apache.org/jira/browse/CASSANDRA-4670 Unfortunately apart from me no one was yet able to reproduce. Check if data is available before/after compaction If you have leveled compaction it is hard to test because you cannot trigger compaction manually. -----Urspr=FCngliche Nachricht----- Von: Alexei Bakanov [mailto:russisk@gmail.com] Gesendet: Mittwoch, 19. Dezember 2012 09:35 An: user@cassandra.apache.org Betreff: Re: TTL on SecondaryIndex Columns. A bug? I'm running on a single node on my laptop. It looks like the point when rows dissapear from the index depends on JVM memory settings. With more memory it needs more data to feed in before things start disappearing. Please try to run Cassandra with -Xms1927M -Xmx1927M -Xmn400M To be sure, try to get rows for 'indexedColumn'=3D'1': [default@ks123] get cf1 where 'indexedColumn'=3D'1'; 0 Row Returned. Thanks On 19 December 2012 05:15, aaron morton wrote: Thanks for the nice steps to reproduce. I ran this on my MBP using C* 1.1.7 and got the expected results, both get's returned a row. Were you running against a single node or a cluster ? If a cluster did you change the CL, cassandra-cli defaults to ONE. Cheers ----------------- Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 18/12/2012, at 9:44 PM, Alexei Bakanov wrote: Hi, We are having an issue with TTL on Secondary index columns. We get 0 rows in return when running queries on indexed columns that have TTL. Everything works fine with small amounts of data, but when we get over a ceratin threshold it looks like older rows dissapear from the index. In the example below we create 70 rows with 45k columns each + one indexed column with just the rowkey as value, so we have one row per indexed value. When the script is finished the index contains rows 66-69. Rows 0-65 are gone from the index. Using 'indexedColumn' without TTL fixes the problem. ------------- SCHEMA START ----------------- create keyspace ks123 with placement_strategy =3D 'NetworkTopologyStrategy' and strategy_options =3D {datacenter1 : 1} and durable_writes =3D true; use ks123; create column family cf1 with column_type =3D 'Standard' and comparator =3D 'AsciiType' and default_validation_class =3D 'AsciiType' and key_validation_class =3D 'AsciiType' and read_repair_chance =3D 0.1 and dclocal_read_repair_chance =3D 0.0 and gc_grace =3D 864000 and min_compaction_threshold =3D 4 and max_compaction_threshold =3D 32 and replicate_on_write =3D true and compaction_strategy =3D 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy' and caching =3D 'KEYS_ONLY' and column_metadata =3D [ {column_name : 'indexedColumn', validation_class : AsciiType, index_name : 'INDEX1', index_type : 0}] and compression_options =3D {'sstable_compression' : 'org.apache.cassandra.io.compress.SnappyCompressor'}; ------------- SCHEMA FINISH ----------------- ------------- POPULATE START ----------------- from pycassa.batch import Mutator import pycassa pool =3D pycassa.ConnectionPool('ks123') cf =3D pycassa.ColumnFamily(pool, 'cf1') for rowKey in xrange(70): b =3D Mutator(pool) for datapoint in xrange(1, 45001): b.insert(cf,str(rowKey), {str(datapoint): 'val'}, ttl=3D7884000); b.insert(cf, str(rowKey), {'indexedColumn': str(rowKey)}, ttl=3D7887600); print 'row %d' % rowKey b.send() b =3D Mutator(pool) pool.dispose() ------------- POPULATE FINISH ----------------- ------------- QUERY START ----------------- [default@ks123] get cf1 where 'indexedColumn'=3D'65'; 0 Row Returned. Elapsed time: 2.38 msec(s). [default@ks123] get cf1 where 'indexedColumn'=3D'66'; ------------------- RowKey: 66 =3D> (column=3D1, value=3Dval, timestamp=3D1355818765548964, ttl=3D7884000)= ... =3D> (column=3D10087, value=3Dval, timestamp=3D1355818766075538, ttl=3D7884= 000) =3D> (column=3DindexedColumn, value=3D66, timestamp=3D1355818768119334, ttl=3D7887600) 1 Row Returned. Elapsed time: 31 msec(s). ------------- QUERY FINISH ----------------- This is all using Cassandra 1.1.7 with default settings. Best regards, Alexei Bakanov ___________________________________________________________________________= ______________________________________________ Ce message et ses pieces jointes peuvent contenir des informations confiden= tielles ou privilegiees et ne doivent donc pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu= ce message par erreur, veuillez le signaler a l'expediteur et le detruire ainsi que les pieces jointes. Les messages el= ectroniques etant susceptibles d'alteration, France Telecom - Orange decline toute responsabilite si ce message a ete al= tere, deforme ou falsifie. Merci. This message and its attachments may contain confidential or privileged inf= ormation that may be protected by law; they should not be distributed, used or copied without authorisation. If you have received this email in error, please notify the sender and dele= te this message and its attachments. As emails may be altered, France Telecom - Orange is not liable for message= s that have been modified, changed or falsified. Thank you. --_000_40AAA5885A0242F2AD76B62E9BD163D8orangecom_ Content-Type: text/html; charset="Windows-1252" Content-Transfer-Encoding: quoted-printable Nice job Aaron,
<= br>
AFAIU now you set the gc_before to the current time for = secondary indexes. And as it was set to Integer.MAX_VALUE before your patch= , removeDeletedStandard function was testing if (column.getLocalDeletiontim= e() < MAX_VALUE) which is always true and so was removing all rows from = the secondary index. Am I right ?

-- 
Cyril SCETBON

On Dec 20, 2012, at 9:28 PM, aaron morton <aaron@thelastpickle.com> wrote:
Yes, but they will get compacted away again unless the= patch is there. 

it's a small patch so you should = be able to apply it easily enough if you need a fix ASAP. 
<= br>
Cheers

-----------------
Aaron Morton
Freelance Cassandra D= eveloper
New Zealand

@aaronmorton
<= div>http://www.thelastpickle.com<= /a>


i believe we have hit thi= s as well.  if you use nodetool to
rebuild_index, does it work?
=
On Wed, Dec 19, 2012 at 8:10 PM, aaron morton <aaron@thelastpickle.com> wrote:
Well that was fun https://issues.apache.org/jira/browse/CASSANDRA-5079=

Just testing my idea of a fix now.

Cheers
-----------= ------
Aaron Morton
Freelance Cassandra Developer
New Zealand
<= br>@aaronmorton
http://www.the= lastpickle.com

On 20/12/2012, at 10:33 AM, aaron morton <aaron@thelastpickle.com> wrot= e:

Please try to run Cassandra with -Xms1927M -Xmx1927M -Xmn400M
=
Done and I now get your repo case=85

[default@ks123] get cf1 whe= re 'indexedColumn'=3D'65';

0 Row Returned.
Elapsed time: 1.44 mse= c(s).


[default@ks123] get cf1 where 'indexedColumn'=3D'66';
-= ------------------
RowKey: 66
=3D> (column=3D1, value=3Dval, times= tamp=3D1355952222439049, ttl=3D7884000)
=3D> (column=3D10, value=3Dva= l, timestamp=3D1355952222439269, ttl=3D7884000)
...
=3D> (column= =3DindexedColumn, value=3D66, timestamp=3D1355952223881937, ttl=3D7887600)<= br>
Looking into it now.

Thanks

-----------------
Aaron= Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton=
http://www.thelastpickle.com

On 19/12/2012, at 9:56 PM, Roland Gude <roland.gude@ez.no> = wrote:

I think this might be https://issues.apache.org/jira/browse/C= ASSANDRA-4670
Unfortunately apart from me no one was yet able to reprodu= ce.

Check if data is available before/after compaction
If you hav= e leveled compaction it is hard to test because you cannot trigger
compa= ction manually.

-----Urspr=FCngliche Nachricht-----
Von: Alexei B= akanov [mailto:russisk@gmail.com]
Gesendet: Mittwoch, 19. Dezember 2012 = 09:35
An: user@cassandra.apache.org
Betreff: Re: TTL on SecondaryInde= x Columns. A bug?

I'm running on a single node on my laptop.
It l= ooks like the point when rows dissapear from the index depends on JVM
me= mory settings. With more memory it needs more data to feed in before
thi= ngs start disappearing.
Please try to run Cassandra with -Xms1927M -Xmx1= 927M -Xmn400M

To be sure, try to get rows for 'indexedColumn'=3D'1':=

[default@ks123] get cf1 where 'indexedColumn'=3D'1';

0 Row R= eturned.

Thanks


On 19 December 2012 05:15, aaron morton &= lt;aaron@thelastpickle.com> wrote:

Thanks for the nice steps to r= eproduce.

I ran this on my MBP using C* 1.1.7 and got the expected r= esults, both
get's returned a row.

Were you running against a sin= gle node or a cluster ? If a cluster did
you change the CL, cassandra-cl= i defaults to ONE.

Cheers

-----------------
Aaron MortonFreelance Cassandra Developer
New Zealand

@aaronmorton
http:= //www.thelastpickle.com

On 18/12/2012, at 9:44 PM, Alexei Bakanov &l= t;russisk@gmail.com> wrote:

Hi,

We are having an issue wit= h TTL on Secondary index columns. We get 0
rows in return when running q= ueries on indexed columns that have TTL.
Everything works fine with smal= l amounts of data, but when we get over
a ceratin threshold it looks lik= e older rows dissapear from the index.
In the example below we create 70= rows with 45k columns each + one
indexed column with just the rowkey as= value, so we have one row per
indexed value. When the script is finishe= d the index contains rows
66-69. Rows 0-65 are gone from the index.
U= sing 'indexedColumn' without TTL fixes the problem.


------------= - SCHEMA START ----------------- create keyspace ks123
with placement_st= rategy =3D 'NetworkTopologyStrategy'
and strategy_options =3D {datacente= r1 : 1}  and durable_writes =3D true;

use ks123;

create = column family cf1
with column_type =3D 'Standard'
and comparator =3D = 'AsciiType'
and default_validation_class =3D 'AsciiType'
and key_vali= dation_class =3D 'AsciiType'
and read_repair_chance =3D 0.1
and dcloc= al_read_repair_chance =3D 0.0
and gc_grace =3D 864000
and min_compact= ion_threshold =3D 4
and max_compaction_threshold =3D 32
and replicate= _on_write =3D true
and compaction_strategy =3D
'org.apache.cassandra.= db.compaction.SizeTieredCompactionStrategy'
and caching =3D 'KEYS_ONLY'<= br>and column_metadata =3D [
 {column_name : 'indexedColumn',
=  validation_class : AsciiType,
 index_name : 'INDEX1',
&n= bsp;index_type : 0}]
and compression_options =3D {'sstable_compression' = :
'org.apache.cassandra.io.compress.SnappyCompressor'};
-------------= SCHEMA FINISH -----------------

------------- POPULATE START ------= ----------- from pycassa.batch
import Mutator import pycassa

pool= =3D pycassa.ConnectionPool('ks123') cf =3D pycassa.ColumnFamily(pool,
'= cf1')

for rowKey in xrange(70):
 b =3D Mutator(pool)
&n= bsp;for datapoint in xrange(1, 45001):
     b.= insert(cf,str(rowKey), {str(datapoint): 'val'}, ttl=3D7884000);
 b= .insert(cf, str(rowKey), {'indexedColumn': str(rowKey)}, ttl=3D7887600);  print 'row %d' % rowKey
 b.send()
 b =3D Mutator(= pool)

pool.dispose()
------------- POPULATE FINISH --------------= ---

------------- QUERY START ----------------- [default@ks123] get = cf1
where 'indexedColumn'=3D'65';

0 Row Returned.
Elapsed time= : 2.38 msec(s).

[default@ks123] get cf1 where 'indexedColumn'=3D'66'= ;
-------------------
RowKey: 66
=3D> (column=3D1, value=3Dval,= timestamp=3D1355818765548964, ttl=3D7884000) ...
=3D> (column=3D1008= 7, value=3Dval, timestamp=3D1355818766075538, ttl=3D7884000)
=3D> (co= lumn=3DindexedColumn, value=3D66, timestamp=3D1355818768119334,
ttl=3D78= 87600)

1 Row Returned.
Elapsed time: 31 msec(s).
-------------= QUERY FINISH -----------------

This is all using Cassandra 1.1.7 wi= th default settings.

Best regards,

Alexei Bakanov


=





______________________________________________=
___________________________________________________________________________

Ce message et ses pieces jointes peuvent contenir des informations confiden=
tielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu=
 ce message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages el=
ectroniques etant susceptibles d'alteration,
France Telecom - Orange decline toute responsabilite si ce message a ete al=
tere, deforme ou falsifie. Merci.

This message and its attachments may contain confidential or privileged inf=
ormation that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and dele=
te this message and its attachments.
As emails may be altered, France Telecom - Orange is not liable for message=
s that have been modified, changed or falsified.
Thank you.
= --_000_40AAA5885A0242F2AD76B62E9BD163D8orangecom_--