cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Caesar, Maik" <maik.cae...@dxc.com>
Subject TWCS: Repair create new buckets with old data
Date Tue, 16 Oct 2018 09:46:13 GMT
Hallo,
we work with Cassandra version 3.0.9 and have a problem in a table with TWCS. The command
"nodetool repair" create always new files with old data. This avoid the delete of the old
data.
The layout of the Table is following:
cqlsh> desc stat.spa

CREATE TABLE stat.spa (
    region int,
    id int,
    date text,
    hour int,
    zippedjsonstring blob,
    PRIMARY KEY ((region, id), date, hour)
) WITH CLUSTERING ORDER BY (date ASC, hour ASC)
    AND bloom_filter_fp_chance = 0.01
    AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
    AND comment = ''
    AND compaction = {'class': 'org.apache.cassandra.db.compaction.TimeWindowCompactionStrategy',
'compaction_window_size': '1', 'compaction_window_unit': 'DAYS', 'max_threshold': '100', 'min_threshold':
'4', 'tombstone_compaction_interval': '86460'}
    AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND crc_check_chance = 1.0
    AND dclocal_read_repair_chance = 0.0
    AND default_time_to_live = 0
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
    AND speculative_retry = '99PERCENTILE';

Actual the oldest data are from 2017/04/15 and will not remove:

$ for f in *Data.db; do meta=$(sudo sstablemetadata $f); echo -e "Max:" $(date --date=@$(echo
"$meta" | grep Maximum\ time | cut -d" "  -f3| cut -c 1-10) '+%Y/%m/%d %H:%M') "Min:" $(date
--date=@$(echo "$meta" | grep Minimum\ time | cut -d" "  -f3| cut -c 1-10) '+%Y/%m/%d %H:%M')
$(echo "$meta" | grep droppable) $(echo "$meta" | grep "Repaired at") ' \t ' $(ls -lh $f |
awk '{print $5" "$6" "$7" "$8" "$9}'); done | sort
Max: 2017/04/15 12:08 Min: 2017/03/31 13:09 Estimated droppable tombstones: 1.7731048805815162
Repaired at: 1525685601400         42K May 7 19:56 mc-22922-big-Data.db
Max: 2017/04/17 13:49 Min: 2017/03/31 13:09 Estimated droppable tombstones: 1.9600207684319835
Repaired at: 1525685601400         116M May 7 13:31 mc-15096-big-Data.db
Max: 2017/04/21 13:43 Min: 2017/04/15 13:34 Estimated droppable tombstones: 1.9090909090909092
Repaired at: 1525685601400         11K May 7 19:56 mc-22921-big-Data.db
Max: 2017/05/23 21:45 Min: 2017/04/21 14:00 Estimated droppable tombstones: 1.8360655737704918
Repaired at: 1525685601400         21M May 7 19:56 mc-22919-big-Data.db
Max: 2017/06/12 15:19 Min: 2017/04/25 14:45 Estimated droppable tombstones: 1.8091397849462365
Repaired at: 1525685601400         19M May 7 14:36 mc-17095-big-Data.db
Max: 2017/06/15 15:26 Min: 2017/05/10 14:37 Estimated droppable tombstones: 1.76536312849162
Repaired at: 1529612605539           9.3M Jun 21 22:31 mc-25372-big-Data.db
...

After a "nodetool repair" run, a new big data file is created that include old data from 2017/07/31.

Max: 2018/07/27 18:10 Min: 2017/03/31 13:13 Estimated droppable tombstones: 0.08392555471691247
Repaired at: 0            11G Sep 11 22:02 mc-39281-big-Data.db
...
Max: 2018/08/16 18:18 Min: 2018/08/06 12:19 Estimated droppable tombstones: 0.0 Repaired at:
1534525730510        123M Aug 17 23:46 mc-36847-big-Data.db
Max: 2018/08/17 19:20 Min: 2017/07/31 12:04 Estimated droppable tombstones: 0.03385963490004347
Repaired at: 0            11G Sep 11 21:43 mc-39265-big-Data.db
Max: 2018/08/17 19:20 Min: 2018/07/24 12:33 Estimated droppable tombstones: 0.0 Repaired at:
1534525730510        135M Sep 11 21:44 mc-39270-big-Data.db
...
Max: 2018/09/06 17:30 Min: 2018/08/28 12:17 Estimated droppable tombstones: 0.0 Repaired at:
1536690786879        129M Sep 11 21:10 mc-39238-big-Data.db
Max: 2018/09/07 18:22 Min: 2017/04/23 12:48 Estimated droppable tombstones: 0.1548442441468401
Repaired at: 0     8.0G Sep 11 21:33 mc-39258-big-Data.db
Max: 2018/09/07 18:22 Min: 2018/09/07 12:15 Estimated droppable tombstones: 0.0 Repaired at:
1536690786879        72M Sep 11 21:34 mc-39262-big-Data.db
Max: 2018/09/08 18:20 Min: 2018/08/22 12:17 Estimated droppable tombstones: 0.0 Repaired at:
0            2.8G Sep 11 21:47 mc-39272-big-Data.db

The tool sstableexpiredblockers shows that the file mc-39281-big-Data.db blocks 95 expired
files from getting dropped, for example the oldest file mc-22922-big-Data.db

[BigTableReader(path='.../stat/spa-.../mc-39281-big-Data.db') (minTS = 1490958782530000, maxTS
= 1532707837676719, maxLDT = 1557154990)
  blocks 95 expired sstables from getting dropped:
 [BigTableReader(path='.../stat/spa-.../mc-36936-big-Data.db') (minTS = 1500027128958000,
maxTS = 1503666765807229, maxLDT = 1535202765)
[BigTableReader(path='.../stat/spa-.../mc-22921-big-Data.db') (minTS = 1492256093314000, maxTS
= 1492775013454001, maxLDT = 1524311013)
[BigTableReader(path='.../stat/spa-.../mc-36947-big-Data.db') (minTS = 1492255708403000, maxTS
= 1501937182477001, maxLDT = 1533473182)
[BigTableReader(path='.../stat/spa-.../mc-32582-big-Data.db') (minTS = 1493028031639000, maxTS
= 1499175057476001, maxLDT = 1530711057)
[BigTableReader(path='.../stat/spa-.../mc-32560-big-Data.db') (minTS = 1500210297826000, maxTS
= 1501416691390001, maxLDT = 1532952691)
[BigTableReader(path='.../stat/spa-.../mc-32528-big-Data.db') (minTS = 1490958761762000, maxTS
= 1504358072394248, maxLDT = 1535894072)
[BigTableReader(path='.../stat/spa-.../mc-32572-big-Data.db') (minTS = 1500027103795000, maxTS
= 1500297137808001, maxLDT = 1531833137)
[BigTableReader(path='.../stat/spa-.../mc-36935-big-Data.db') (minTS = 1500038582669000, maxTS
= 1503839159485824, maxLDT = 1535375159)
[BigTableReader(path='.../stat/spa-.../mc-22922-big-Data.db') (minTS = 1490958570018000, maxTS
= 1492250905633001, maxLDT = 1523786905)
[BigTableReader(path='.../stat/spa-.../mc-33470-big-Data.db') (minTS = 1499940836241000, maxTS
= 1500040376685000, maxLDT = 1531576376)

Why create the repair such turbulence in new data files and how can we remove the old data?

Kind Regards

Maik Cäsar



DXC Technology Company -- This message is transmitted to you by or on behalf of DXC Technology
Company or one of its affiliates. It is intended exclusively for the addressee. The substance
of this message, along with any attachments, may contain proprietary, confidential or privileged
information or information that is otherwise legally exempt from disclosure. Any unauthorized
review, use, disclosure or distribution is prohibited. If you are not the intended recipient
of this message, you are not authorized to read, print, retain, copy or disseminate any part
of this message. If you have received this message in error, please destroy and delete all
copies and notify the sender by return e-mail. Regardless of content, this e-mail shall not
operate to bind DXC Technology Company or any of its affiliates to any order or other contract
unless pursuant to explicit written agreement or government initiative expressly permitting
the use of e-mail for such purpose. --.

Mime
View raw message