cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Donald Smith (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CASSANDRA-5220) Repair improvements when using vnodes
Date Mon, 23 Dec 2013 20:50:51 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-5220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13855774#comment-13855774
] 

Donald Smith edited comment on CASSANDRA-5220 at 12/23/13 8:49 PM:
-------------------------------------------------------------------

 We ran "nodetool repair" on a 3 node cassandra cluster with production-quality hardware,
using version 2.0.3. Each node had about 1TB of data. This is still testing.  After 5 days
the repair job still hasn't finished. I can see it's still running.

Here's the process:
{noformat}
root     30835 30774  0 Dec17 pts/0    00:03:53 /usr/bin/java -cp /etc/cassandra/conf:/usr/share/java/jna.jar:/usr/share/cassandra/lib/antlr-3.2.jar:/usr/share/cassandra/lib/apache-cassandra-2.0.3.jar:/usr/share/cassandra/lib/apache-cassandra-clientutil-2.0.3.jar:/usr/share/cassandra/lib/apache-cassandra-thrift-2.0.3.jar:/usr/share/cassandra/lib/commons-cli-1.1.jar:/usr/share/cassandra/lib/commons-codec-1.2.jar:/usr/share/cassandra/lib/commons-lang3-3.1.jar:/usr/share/cassandra/lib/compress-lzf-0.8.4.jar:/usr/share/cassandra/lib/concurrentlinkedhashmap-lru-1.3.jar:/usr/share/cassandra/lib/disruptor-3.0.1.jar:/usr/share/cassandra/lib/guava-15.0.jar:/usr/share/cassandra/lib/high-scale-lib-1.1.2.jar:/usr/share/cassandra/lib/jackson-core-asl-1.9.2.jar:/usr/share/cassandra/lib/jackson-mapper-asl-1.9.2.jar:/usr/share/cassandra/lib/jamm-0.2.5.jar:/usr/share/cassandra/lib/jbcrypt-0.3m.jar:/usr/share/cassandra/lib/jline-1.0.jar:/usr/share/cassandra/lib/json-simple-1.1.jar:/usr/share/cassandra/lib/libthrift-0.9.1.jar:/usr/share/cassandra/lib/log4j-1.2.16.jar:/usr/share/cassandra/lib/lz4-1.2.0.jar:/usr/share/cassandra/lib/metrics-core-2.2.0.jar:/usr/share/cassandra/lib/netty-3.6.6.Final.jar:/usr/share/cassandra/lib/reporter-config-2.1.0.jar:/usr/share/cassandra/lib/servlet-api-2.5-20081211.jar:/usr/share/cassandra/lib/slf4j-api-1.7.2.jar:/usr/share/cassandra/lib/slf4j-log4j12-1.7.2.jar:/usr/share/cassandra/lib/snakeyaml-1.11.jar:/usr/share/cassandra/lib/snappy-java-1.0.5.jar:/usr/share/cassandra/lib/snaptree-0.1.jar:/usr/share/cassandra/lib/stress.jar:/usr/share/cassandra/lib/thrift-server-0.3.2.jar
-Xmx32m -Dlog4j.configuration=log4j-tools.properties -Dstorage-config=/etc/cassandra/conf
org.apache.cassandra.tools.NodeCmd -p 7199 repair -pr as_reports
{noformat}

The log output has just:
{noformat}
xss =  -ea -javaagent:/usr/share/cassandra/lib/jamm-0.2.5.jar -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42
-Xms8192M -Xmx8192M -Xmn2048M -XX:+HeapDumpOnOutOfMemoryError -Xss256k
[2013-12-17 23:26:48,144] Starting repair command #1, repairing 256 ranges for keyspace as_reports
{noformat}

Here's the output of "nodetool tpstats":
{noformat}
cass3 /tmp> nodetool tpstats
xss =  -ea -javaagent:/usr/share/cassandra/lib/jamm-0.2.5.jar -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42
-Xms8192M -Xmx8192M -Xmn2048M -XX:+HeapDumpOnOutOfMemoryError -Xss256k
Pool Name                    Active   Pending      Completed   Blocked  All time blocked
ReadStage                         1         0       38083403         0                 0
RequestResponseStage              0         0     1951200451         0                 0
MutationStage                     0         0     2853354069         0                 0
ReadRepairStage                   0         0        3794926         0                 0
ReplicateOnWriteStage             0         0              0         0                 0
GossipStage                       0         0        4880147         0                 0
AntiEntropyStage                  1         3              9         0                 0
MigrationStage                    0         0             30         0                 0
MemoryMeter                       0         0            115         0                 0
MemtablePostFlusher               0         0          75121         0                 0
FlushWriter                       0         0          49934         0                52
MiscStage                         0         0              0         0                 0
PendingRangeCalculator            0         0              7         0                 0
commitlog_archiver                0         0              0         0                 0
AntiEntropySessions               1         1              1         0                 0
InternalResponseStage             0         0              9         0                 0
HintedHandoff                     0         0           1141         0                 0

Message type           Dropped
RANGE_SLICE                  0
READ_REPAIR                  0
PAGED_RANGE                  0
BINARY                       0
READ                       884
MUTATION               1407711
_TRACE                       0
REQUEST_RESPONSE             0
{noformat}
The cluster has some write traffic to it. We decided to test it under load.
This is the busiest column family, as reported by "nodetool cfstats":
{noformat}
   Read Count: 38084316
        Read Latency: 9.409910464927346 ms.
        Write Count: 2850436738
        Write Latency: 0.8083138546641199 ms.
        Pending Tasks: 0
....
    Table: data_report_details
                SSTable count: 592
                Space used (live), bytes: 160644106183
                Space used (total), bytes: 160663248847
                SSTable Compression Ratio: 0.5296494510512617
                Number of keys (estimate): 51015040
                Memtable cell count: 311180
                Memtable data size, bytes: 46275953
                Memtable switch count: 6100
                Local read count: 6147
                Local read latency: 154.539 ms
                Local write count: 750865416
                Local write latency: 0.029 ms
                Pending tasks: 0
                Bloom filter false positives: 265
                Bloom filter false ratio: 0.06009
                Bloom filter space used, bytes: 64690104
                Compacted partition minimum bytes: 30
                Compacted partition maximum bytes: 10090808
                Compacted partition mean bytes: 5267
                Average live cells per slice (last five minutes): 1.0
                Average tombstones per slice (last five minutes): 0.0
{noformat}
We're gonna restart the node.  We barely do deletes or updates (only if a report is re-uploaded),
so we suspect that we can get by without doing repairs. Correct me if we're wrong about that.

 nodetool compactionstats outputs:
{noformat}
xss =  -ea -javaagent:/usr/share/cassandra/lib/jamm-0.2.5.jar -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42
-Xms8192M -Xmx8192M -Xmn2048M -XX:+HeapDumpOnOutOfMemoryError -Xss256k
pending tasks: 166
          compaction type        keyspace           table       completed           total
     unit  progress
               Compaction      as_reportsdata_report_details_below_threshold       971187148
     1899419306     bytes    51.13%
               Compaction      as_reportsdata_report_details_below_threshold       950086203
     1941500979     bytes    48.94%
               Compaction      as_reportsdata_hierarchy_details      2968934609      5808990354
    bytes    51.11%
               Compaction      as_reportsdata_report_details_below_threshold       945816183
     1900166474     bytes    49.78%
               Compaction      as_reportsdata_report_details_below_threshold       899143344
     1943534395     bytes    46.26%
               Compaction      as_reportsdata_report_details_below_threshold       856329840
     1946566670     bytes    43.99%
               Compaction      as_reportsdata_report_details       195235688       915395763
    bytes    21.33%
               Compaction      as_reportsdata_report_details_below_threshold       982460217
     1931001761     bytes    50.88%
               Compaction      as_reportsdata_report_details_below_threshold       896609409
     1931075688     bytes    46.43%
               Compaction      as_reportsdata_report_details_below_threshold       869219044
     1928977382     bytes    45.06%
               Compaction      as_reportsdata_report_details_below_threshold       870931112
     1901729646     bytes    45.80%
               Compaction      as_reportsdata_report_details_below_threshold       879343635
     1939491280     bytes    45.34%
               Compaction      as_reportsdata_report_details_below_threshold       981888944
     1893024439     bytes    51.87%
               Compaction      as_reportsdata_report_details_below_threshold       871785587
     1884652607     bytes    46.26%
               Compaction      as_reportsdata_report_details_below_threshold       902340327
     1913280943     bytes    47.16%
               Compaction      as_reportsdata_report_details_below_threshold      1025069846
     1901568674     bytes    53.91%
               Compaction      as_reportsdata_report_details_below_threshold       920112020
     1893272832     bytes    48.60%
               Compaction      as_reportsdata_hierarchy_details      2962138268      5774762866
    bytes    51.29%
               Compaction      as_reportsdata_report_details_below_threshold       790782860
     1918640911     bytes    41.22%
               Compaction      as_reportsdata_hierarchy_details      2972501409      5885217724
    bytes    50.51%
               Compaction      as_reportsdata_report_details_below_threshold      1611697659
     1939040337     bytes    83.12%
               Compaction      as_reportsdata_report_details_below_threshold       943130526
     1943713837     bytes    48.52%
               Compaction      as_reportsdata_report_details_below_threshold       911127302
     1952885196     bytes    46.66%
               Compaction      as_reportsdata_report_details_below_threshold       911230087
     1927967871     bytes    47.26%
{noformat}

Now "nodetool tpstats" says:
{noformat}
Pool Name                    Active   Pending      Completed   Blocked  All time blocked
AntiEntropyStage                  1         3              9         0                 0
{noformat}

We ran "nodetool repair -pr" on 10.1.40.43. Here are references to it. So, maybe the nodetool
repair job hung. 
{noformat}
cass3 /var/log/cassandra> grep -i repair system.log.? | grep -i merkle
system.log.1: INFO [AntiEntropySessions:1] 2013-12-17 23:26:48,459 RepairJob.java (line 116)
[repair #c1540f60-67b5-11e3-b8b7-fb178cd88033] requesting merkle trees for data_report_details_by_uus
(to [/10.1.40.42, dc1-cassandra-staging-03.dc01.revsci.net/10.1.40.43])
system.log.1: INFO [AntiEntropyStage:1] 2013-12-17 23:26:48,807 RepairSession.java (line 157)
[repair #c1540f60-67b5-11e3-b8b7-fb178cd88033] Received merkle tree for data_report_details_by_uus
from /10.1.40.42
system.log.1: INFO [AntiEntropyStage:1] 2013-12-17 23:26:49,091 RepairSession.java (line 157)
[repair #c1540f60-67b5-11e3-b8b7-fb178cd88033] Received merkle tree for data_report_details_by_uus
from /10.1.40.43
system.log.1: INFO [AntiEntropyStage:1] 2013-12-19 03:58:31,007 RepairJob.java (line 116)
[repair #c1540f60-67b5-11e3-b8b7-fb178cd88033] requesting merkle trees for data_hierarchy_details
(to [/10.1.40.42, dc1-cassandra-staging-03.dc01.revsci.net/10.1.40.43])
system.log.1: INFO [AntiEntropySessions:5] 2013-12-19 03:58:31,012 RepairJob.java (line 116)
[repair #e0ff9ba0-68a4-11e3-b8b7-fb178cd88033] requesting merkle trees for data_report_details_by_uus
(to [/10.1.40.41, dc1-cassandra-staging-03.dc01.revsci.net/10.1.40.43])
system.log.1: INFO [AntiEntropyStage:1] 2013-12-19 03:58:31,316 RepairSession.java (line 157)
[repair #e0ff9ba0-68a4-11e3-b8b7-fb178cd88033] Received merkle tree for data_report_details_by_uus
from /10.1.40.41
system.log.1: INFO [AntiEntropyStage:1] 2013-12-19 03:58:31,431 RepairSession.java (line 157)
[repair #e0ff9ba0-68a4-11e3-b8b7-fb178cd88033] Received merkle tree for data_report_details_by_uus
from /10.1.40.43
{noformat}


was (Author: thinkerfeeler):
 We ran "nodetool repair" on a 3 node cassandra cluster with production-quality hardware,
using version 2.0.3. Each node had about 1TB of data. This is still testing.  After 5 days
the repair job still hasn't finished. I can see it's still running.

Here's the process:
{noformat}
root     30835 30774  0 Dec17 pts/0    00:03:53 /usr/bin/java -cp /etc/cassandra/conf:/usr/share/java/jna.jar:/usr/share/cassandra/lib/antlr-3.2.jar:/usr/share/cassandra/lib/apache-cassandra-2.0.3.jar:/usr/share/cassandra/lib/apache-cassandra-clientutil-2.0.3.jar:/usr/share/cassandra/lib/apache-cassandra-thrift-2.0.3.jar:/usr/share/cassandra/lib/commons-cli-1.1.jar:/usr/share/cassandra/lib/commons-codec-1.2.jar:/usr/share/cassandra/lib/commons-lang3-3.1.jar:/usr/share/cassandra/lib/compress-lzf-0.8.4.jar:/usr/share/cassandra/lib/concurrentlinkedhashmap-lru-1.3.jar:/usr/share/cassandra/lib/disruptor-3.0.1.jar:/usr/share/cassandra/lib/guava-15.0.jar:/usr/share/cassandra/lib/high-scale-lib-1.1.2.jar:/usr/share/cassandra/lib/jackson-core-asl-1.9.2.jar:/usr/share/cassandra/lib/jackson-mapper-asl-1.9.2.jar:/usr/share/cassandra/lib/jamm-0.2.5.jar:/usr/share/cassandra/lib/jbcrypt-0.3m.jar:/usr/share/cassandra/lib/jline-1.0.jar:/usr/share/cassandra/lib/json-simple-1.1.jar:/usr/share/cassandra/lib/libthrift-0.9.1.jar:/usr/share/cassandra/lib/log4j-1.2.16.jar:/usr/share/cassandra/lib/lz4-1.2.0.jar:/usr/share/cassandra/lib/metrics-core-2.2.0.jar:/usr/share/cassandra/lib/netty-3.6.6.Final.jar:/usr/share/cassandra/lib/reporter-config-2.1.0.jar:/usr/share/cassandra/lib/servlet-api-2.5-20081211.jar:/usr/share/cassandra/lib/slf4j-api-1.7.2.jar:/usr/share/cassandra/lib/slf4j-log4j12-1.7.2.jar:/usr/share/cassandra/lib/snakeyaml-1.11.jar:/usr/share/cassandra/lib/snappy-java-1.0.5.jar:/usr/share/cassandra/lib/snaptree-0.1.jar:/usr/share/cassandra/lib/stress.jar:/usr/share/cassandra/lib/thrift-server-0.3.2.jar
-Xmx32m -Dlog4j.configuration=log4j-tools.properties -Dstorage-config=/etc/cassandra/conf
org.apache.cassandra.tools.NodeCmd -p 7199 repair -pr as_reports
{noformat}

The log output has just:
{noformat}
xss =  -ea -javaagent:/usr/share/cassandra/lib/jamm-0.2.5.jar -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42
-Xms8192M -Xmx8192M -Xmn2048M -XX:+HeapDumpOnOutOfMemoryError -Xss256k
[2013-12-17 23:26:48,144] Starting repair command #1, repairing 256 ranges for keyspace as_reports
{noformat}

Here's the output of "nodetool tpstats":
{noformat}
cass3 /tmp> nodetool tpstats
xss =  -ea -javaagent:/usr/share/cassandra/lib/jamm-0.2.5.jar -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42
-Xms8192M -Xmx8192M -Xmn2048M -XX:+HeapDumpOnOutOfMemoryError -Xss256k
Pool Name                    Active   Pending      Completed   Blocked  All time blocked
ReadStage                         1         0       38083403         0                 0
RequestResponseStage              0         0     1951200451         0                 0
MutationStage                     0         0     2853354069         0                 0
ReadRepairStage                   0         0        3794926         0                 0
ReplicateOnWriteStage             0         0              0         0                 0
GossipStage                       0         0        4880147         0                 0
AntiEntropyStage                  1         3              9         0                 0
MigrationStage                    0         0             30         0                 0
MemoryMeter                       0         0            115         0                 0
MemtablePostFlusher               0         0          75121         0                 0
FlushWriter                       0         0          49934         0                52
MiscStage                         0         0              0         0                 0
PendingRangeCalculator            0         0              7         0                 0
commitlog_archiver                0         0              0         0                 0
AntiEntropySessions               1         1              1         0                 0
InternalResponseStage             0         0              9         0                 0
HintedHandoff                     0         0           1141         0                 0

Message type           Dropped
RANGE_SLICE                  0
READ_REPAIR                  0
PAGED_RANGE                  0
BINARY                       0
READ                       884
MUTATION               1407711
_TRACE                       0
REQUEST_RESPONSE             0
{noformat}
The cluster has some write traffic to it. We decided to test it under load.
This is the busiest column family, as reported by "nodetool cfstats":
{noformat}
   Read Count: 38084316
        Read Latency: 9.409910464927346 ms.
        Write Count: 2850436738
        Write Latency: 0.8083138546641199 ms.
        Pending Tasks: 0
....
    Table: data_report_details
                SSTable count: 592
                Space used (live), bytes: 160644106183
                Space used (total), bytes: 160663248847
                SSTable Compression Ratio: 0.5296494510512617
                Number of keys (estimate): 51015040
                Memtable cell count: 311180
                Memtable data size, bytes: 46275953
                Memtable switch count: 6100
                Local read count: 6147
                Local read latency: 154.539 ms
                Local write count: 750865416
                Local write latency: 0.029 ms
                Pending tasks: 0
                Bloom filter false positives: 265
                Bloom filter false ratio: 0.06009
                Bloom filter space used, bytes: 64690104
                Compacted partition minimum bytes: 30
                Compacted partition maximum bytes: 10090808
                Compacted partition mean bytes: 5267
                Average live cells per slice (last five minutes): 1.0
                Average tombstones per slice (last five minutes): 0.0
{noformat}
We're gonna restart the node.  We barely do deletes or updates (only if a report is re-uploaded),
so we suspect that we can get by without doing repairs. Correct me if we're wrong about that.

 nodetool compactionstats outputs:
{noformat}
xss =  -ea -javaagent:/usr/share/cassandra/lib/jamm-0.2.5.jar -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42
-Xms8192M -Xmx8192M -Xmn2048M -XX:+HeapDumpOnOutOfMemoryError -Xss256k
pending tasks: 166
          compaction type        keyspace           table       completed           total
     unit  progress
               Compaction      as_reportsdata_report_details_below_threshold       971187148
     1899419306     bytes    51.13%
               Compaction      as_reportsdata_report_details_below_threshold       950086203
     1941500979     bytes    48.94%
               Compaction      as_reportsdata_hierarchy_details      2968934609      5808990354
    bytes    51.11%
               Compaction      as_reportsdata_report_details_below_threshold       945816183
     1900166474     bytes    49.78%
               Compaction      as_reportsdata_report_details_below_threshold       899143344
     1943534395     bytes    46.26%
               Compaction      as_reportsdata_report_details_below_threshold       856329840
     1946566670     bytes    43.99%
               Compaction      as_reportsdata_report_details       195235688       915395763
    bytes    21.33%
               Compaction      as_reportsdata_report_details_below_threshold       982460217
     1931001761     bytes    50.88%
               Compaction      as_reportsdata_report_details_below_threshold       896609409
     1931075688     bytes    46.43%
               Compaction      as_reportsdata_report_details_below_threshold       869219044
     1928977382     bytes    45.06%
               Compaction      as_reportsdata_report_details_below_threshold       870931112
     1901729646     bytes    45.80%
               Compaction      as_reportsdata_report_details_below_threshold       879343635
     1939491280     bytes    45.34%
               Compaction      as_reportsdata_report_details_below_threshold       981888944
     1893024439     bytes    51.87%
               Compaction      as_reportsdata_report_details_below_threshold       871785587
     1884652607     bytes    46.26%
               Compaction      as_reportsdata_report_details_below_threshold       902340327
     1913280943     bytes    47.16%
               Compaction      as_reportsdata_report_details_below_threshold      1025069846
     1901568674     bytes    53.91%
               Compaction      as_reportsdata_report_details_below_threshold       920112020
     1893272832     bytes    48.60%
               Compaction      as_reportsdata_hierarchy_details      2962138268      5774762866
    bytes    51.29%
               Compaction      as_reportsdata_report_details_below_threshold       790782860
     1918640911     bytes    41.22%
               Compaction      as_reportsdata_hierarchy_details      2972501409      5885217724
    bytes    50.51%
               Compaction      as_reportsdata_report_details_below_threshold      1611697659
     1939040337     bytes    83.12%
               Compaction      as_reportsdata_report_details_below_threshold       943130526
     1943713837     bytes    48.52%
               Compaction      as_reportsdata_report_details_below_threshold       911127302
     1952885196     bytes    46.66%
               Compaction      as_reportsdata_report_details_below_threshold       911230087
     1927967871     bytes    47.26%
{noformat}

Now "nodetool tpstats" says:
{noformat}
Pool Name                    Active   Pending      Completed   Blocked  All time blocked
AntiEntropyStage                  1         3              9         0                 0
{noformat}

> Repair improvements when using vnodes
> -------------------------------------
>
>                 Key: CASSANDRA-5220
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5220
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 1.2.0 beta 1
>            Reporter: Brandon Williams
>            Assignee: Yuki Morishita
>             Fix For: 2.1
>
>
> Currently when using vnodes, repair takes much longer to complete than without them.
 This appears at least in part because it's using a session per range and processing them
sequentially.  This generates a lot of log spam with vnodes, and while being gentler and lighter
on hard disk deployments, ssd-based deployments would often prefer that repair be as fast
as possible.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Mime
View raw message