incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <cscetbon....@orange.com>
Subject Re: TTL on SecondaryIndex Columns. A bug?
Date Fri, 21 Dec 2012 15:05:08 GMT
Nice job Aaron,

AFAIU now you set the gc_before to the current time for secondary indexes. And as it was set
to Integer.MAX_VALUE before your patch, removeDeletedStandard function was testing if (column.getLocalDeletiontime()
< MAX_VALUE) which is always true and so was removing all rows from the secondary index.
Am I right ?

--
Cyril SCETBON

On Dec 20, 2012, at 9:28 PM, aaron morton <aaron@thelastpickle.com<mailto:aaron@thelastpickle.com>>
wrote:

Yes, but they will get compacted away again unless the patch is there.

it's a small patch so you should be able to apply it easily enough if you need a fix ASAP.

Cheers

-----------------
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com<http://www.thelastpickle.com/>

On 20/12/2012, at 5:27 PM, B. Todd Burruss <btoddb@gmail.com<mailto:btoddb@gmail.com>>
wrote:

i believe we have hit this as well.  if you use nodetool to
rebuild_index, does it work?

On Wed, Dec 19, 2012 at 8:10 PM, aaron morton <aaron@thelastpickle.com<mailto:aaron@thelastpickle.com>>
wrote:
Well that was fun https://issues.apache.org/jira/browse/CASSANDRA-5079

Just testing my idea of a fix now.

Cheers
-----------------
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com<http://www.thelastpickle.com/>

On 20/12/2012, at 10:33 AM, aaron morton <aaron@thelastpickle.com<mailto:aaron@thelastpickle.com>>
wrote:

Please try to run Cassandra with -Xms1927M -Xmx1927M -Xmn400M

Done and I now get your repo case…

[default@ks123] get cf1 where 'indexedColumn'='65';

0 Row Returned.
Elapsed time: 1.44 msec(s).


[default@ks123] get cf1 where 'indexedColumn'='66';
-------------------
RowKey: 66
=> (column=1, value=val, timestamp=1355952222439049, ttl=7884000)
=> (column=10, value=val, timestamp=1355952222439269, ttl=7884000)
...
=> (column=indexedColumn, value=66, timestamp=1355952223881937, ttl=7887600)

Looking into it now.

Thanks

-----------------
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 19/12/2012, at 9:56 PM, Roland Gude <roland.gude@ez.no> wrote:

I think this might be https://issues.apache.org/jira/browse/CASSANDRA-4670
Unfortunately apart from me no one was yet able to reproduce.

Check if data is available before/after compaction
If you have leveled compaction it is hard to test because you cannot trigger
compaction manually.

-----Urspr√ľngliche Nachricht-----
Von: Alexei Bakanov [mailto:russisk@gmail.com]
Gesendet: Mittwoch, 19. Dezember 2012 09:35
An: user@cassandra.apache.org
Betreff: Re: TTL on SecondaryIndex Columns. A bug?

I'm running on a single node on my laptop.
It looks like the point when rows dissapear from the index depends on JVM
memory settings. With more memory it needs more data to feed in before
things start disappearing.
Please try to run Cassandra with -Xms1927M -Xmx1927M -Xmn400M

To be sure, try to get rows for 'indexedColumn'='1':

[default@ks123] get cf1 where 'indexedColumn'='1';

0 Row Returned.

Thanks


On 19 December 2012 05:15, aaron morton <aaron@thelastpickle.com> wrote:

Thanks for the nice steps to reproduce.

I ran this on my MBP using C* 1.1.7 and got the expected results, both
get's returned a row.

Were you running against a single node or a cluster ? If a cluster did
you change the CL, cassandra-cli defaults to ONE.

Cheers

-----------------
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 18/12/2012, at 9:44 PM, Alexei Bakanov <russisk@gmail.com> wrote:

Hi,

We are having an issue with TTL on Secondary index columns. We get 0
rows in return when running queries on indexed columns that have TTL.
Everything works fine with small amounts of data, but when we get over
a ceratin threshold it looks like older rows dissapear from the index.
In the example below we create 70 rows with 45k columns each + one
indexed column with just the rowkey as value, so we have one row per
indexed value. When the script is finished the index contains rows
66-69. Rows 0-65 are gone from the index.
Using 'indexedColumn' without TTL fixes the problem.


------------- SCHEMA START ----------------- create keyspace ks123
with placement_strategy = 'NetworkTopologyStrategy'
and strategy_options = {datacenter1 : 1}  and durable_writes = true;

use ks123;

create column family cf1
with column_type = 'Standard'
and comparator = 'AsciiType'
and default_validation_class = 'AsciiType'
and key_validation_class = 'AsciiType'
and read_repair_chance = 0.1
and dclocal_read_repair_chance = 0.0
and gc_grace = 864000
and min_compaction_threshold = 4
and max_compaction_threshold = 32
and replicate_on_write = true
and compaction_strategy =
'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'
and caching = 'KEYS_ONLY'
and column_metadata = [
 {column_name : 'indexedColumn',
 validation_class : AsciiType,
 index_name : 'INDEX1',
 index_type : 0}]
and compression_options = {'sstable_compression' :
'org.apache.cassandra.io.compress.SnappyCompressor'};
------------- SCHEMA FINISH -----------------

------------- POPULATE START ----------------- from pycassa.batch
import Mutator import pycassa

pool = pycassa.ConnectionPool('ks123') cf = pycassa.ColumnFamily(pool,
'cf1')

for rowKey in xrange(70):
 b = Mutator(pool)
 for datapoint in xrange(1, 45001):
     b.insert(cf,str(rowKey), {str(datapoint): 'val'}, ttl=7884000);
 b.insert(cf, str(rowKey), {'indexedColumn': str(rowKey)}, ttl=7887600);
 print 'row %d' % rowKey
 b.send()
 b = Mutator(pool)

pool.dispose()
------------- POPULATE FINISH -----------------

------------- QUERY START ----------------- [default@ks123] get cf1
where 'indexedColumn'='65';

0 Row Returned.
Elapsed time: 2.38 msec(s).

[default@ks123] get cf1 where 'indexedColumn'='66';
-------------------
RowKey: 66
=> (column=1, value=val, timestamp=1355818765548964, ttl=7884000) ...
=> (column=10087, value=val, timestamp=1355818766075538, ttl=7884000)
=> (column=indexedColumn, value=66, timestamp=1355818768119334,
ttl=7887600)

1 Row Returned.
Elapsed time: 31 msec(s).
------------- QUERY FINISH -----------------

This is all using Cassandra 1.1.7 with default settings.

Best regards,

Alexei Bakanov









_________________________________________________________________________________________________________________________

Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees
et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par
erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant
susceptibles d'alteration,
France Telecom - Orange decline toute responsabilite si ce message a ete altere, deforme ou
falsifie. Merci.

This message and its attachments may contain confidential or privileged information that may
be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete this message
and its attachments.
As emails may be altered, France Telecom - Orange is not liable for messages that have been
modified, changed or falsified.
Thank you.


Mime
View raw message