cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean Tremblay <jean.tremb...@zen-innovations.com>
Subject Re: Nodetool repair with Load times 5
Date Wed, 19 Aug 2015 07:15:20 GMT
Dear Alain,

Thanks again for your precious help.

I might help, but I need to know what you have done recently (change the RF, Add remove node,
cleanups, anything else as much as possible...)

I have a cluster of 5 nodes all running Cassandra 2.1.8.
I have a fixed schema which never changes. I have not changed RF, it is 3. I have not remove
nodes, no cleanups.

Basically here are the important operations I have done:

- Install Cassandra 2.1.7 on a cluster of 5 nodes with RF 3 using Sized-Tiered compaction.
- Insert 2 billion rows. (bulk load)
- Made loads of selects statements… Verified that the data is good.
- Did some deletes and a bit more inserts.
- Eventually migrated to 2.1.8
- Then only very few delete/inserts.
- Did a few snapshots.

When I was doing “nodetool status” I always got a load of about 200 GB on **all** nodes.

- Then I did a “nodetool -h node0 repair -par -pr -inc” and after that I had a completely
different picture.

nodetool -h zennode0 status
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address        Load       Tokens  Owns    Host ID                               Rack
UN  192.168.2.104  941.49 GB  256     ?       c13e0858-091c-47c4-8773-6d6262723435  rack1
UN  192.168.2.100  1.07 TB    256     ?       c32a9357-e37e-452e-8eb1-57d86314b419  rack1
UN  192.168.2.101  189.72 GB  256     ?       9af90dea-90b3-4a8a-b88a-0aeabe3cea79  rack1
UN  192.168.2.102  948.61 GB  256     ?       8eb7a5bb-6903-4ae1-a372-5436d0cc170c  rack1
UN  192.168.2.103  197.27 GB  256     ?       9efc6f13-2b02-4400-8cde-ae831feb86e9  rack1


Also, could you please do the "nodetool status myks" for your keyspace(s) ? We will then be
able to know the theoretical ownership of each node on your distinct (or unique) keyspace(s)
?

nodetool -h zennode0 status XYZdata
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address        Load       Tokens  Owns (effective)  Host ID                          
    Rack
UN  192.168.2.104  941.49 GB  256     62.5%             c13e0858-091c-47c4-8773-6d6262723435
 rack1
UN  192.168.2.100  1.07 TB    256     58.4%             c32a9357-e37e-452e-8eb1-57d86314b419
 rack1
UN  192.168.2.101  189.72 GB  256     58.4%             9af90dea-90b3-4a8a-b88a-0aeabe3cea79
 rack1
UN  192.168.2.102  948.61 GB  256     60.1%             8eb7a5bb-6903-4ae1-a372-5436d0cc170c
 rack1
UN  192.168.2.103  197.27 GB  256     60.6%             9efc6f13-2b02-4400-8cde-ae831feb86e9
 rack1


Some ideas:

You repaired only a primary range ("-pr") of one node, with a RF of 3 and have 3 big nodes,
if not using vnodes, this would be almost normal (excepted for the gap 200 GB --> 1 TB,
this is huge, unless you messed up with RF). So are you using them ?

My schema is totally fixed and I use RF 3 since the beginning. Sorry I’m not too aquinted
with vnodes. I have not changed anything in the cassandra.yaml except the seeds and the name
of the cluster.

2/ Load is barely the size of the data on each node

If it is the size of the data how can it fit on the disk?
My 5 nodes have an SSD drive of 1 TB and here is the disk usage for each of them:

node0: 25%
node1: 25%
node2: 24%
node3: 26%
node4: 29%

nodetool status says that the load for node0 is 1.07TB. That is more than fit of it’s disk,
and the disk usage for node0 is 25%.

This is not clear for me… the Load in nodetool status output seems to be more that “the
size of the data on a node”.


On 18 Aug 2015, at 19:29 , Alain RODRIGUEZ <arodrime@gmail.com<mailto:arodrime@gmail.com>>
wrote:

Hi Jean,

I might help, but I need to know what you have done recently (change the RF, Add remove node,
cleanups, anything else as much as possible...)

Also, could you please do the "nodetool status myks" for your keyspace(s) ? We will then be
able to know the theoretical ownership of each node on your distinct (or unique) keyspace(s)
?

Some ideas:

You repaired only a primary range ("-pr") of one node, with a RF of 3 and have 3 big nodes,
if not using vnodes, this would be almost normal (excepted for the gap 200 GB --> 1 TB,
this is huge, unless you messed up with RF). So are you using them ?

Answers:

1/ It depends on what happen to this cluster (see my questions above)
2/ Load is barely the size of the data on each node
3/ No, this is not a normal nor stable situation.
4/ No, pr means you repaired only the partition that node is responsible for (depends on token),
you have to run this on all nodes. But I would wait to find out first what's happening to
avoid hitting the threshold on disk space or whatever.

I guess I have been confused with the -par switch, which means to me that the work will be
done in parallel and therefore will be done on all nodes.

So if I understand right, one should do a “nodetool repair -par -pr -inc” on all nodes
one after the other? Is this correct?

————————————
I have a second cluster, a smaller one, one with only three nodes. Configured exactly the
same way, except that it has other seeds and another cluster name. I have done basically exactly
the same manipulations, as I listed above.

On that second cluster, before I did the "nodetool repair" I had loads given from the “nodetool
status” being around 350 GB for each node.

After the “nodetool repair” it sprang up to

nodetool status XYZdata
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address        Load       Tokens  Owns (effective)  Host ID                          
    Rack
UN  192.168.2.200  2.16 TB    256     100.0%            1c28329b-f247-4e37-9664-85ee4492c46b
 rack1
UN  192.168.2.201  4.09 TB    256     100.0%            5afdb2cf-a57c-4b58-8f1f-65287b11dc3b
 rack1
UN  192.168.2.202  4.68 TB    256     100.0%            8e296000-809f-4fbc-a508-fcf7020556bd
 rack1

——————————

Beside these very strange results on both clusters the systems seem to behave properly, as
far as the select statements executed. The performance seems slightly reduced.

Mark Greene asked me if I did a nodetool repair. I actually tried to see it if it would change
anything. Trying it returned instantly, which is odd.

Looking at the log files I see this:

nodetool -h nodet0 cleanup
INFO  [RMI TCP Connection(1538)-192.168.2.200] 2015-08-19 09:08:33,897 CompactionManager.java:264
- No sstables for system_traces.sessions
INFO  [RMI TCP Connection(1538)-192.168.2.200] 2015-08-19 09:08:33,898 CompactionManager.java:264
- No sstables for system_traces.events

!!!!

—————

Looking at the log files since I did the repair… I observe no errors not warnings except
two type of warnings…

1) Warnings related to tombstones… which I think is normal.

WARN  [SharedPool-Worker-1] 2015-08-19 09:02:01,696 SliceQueryFilter.java:319 - Read 1329
live and 1097 tombstone cells in XYZdata.co<http://XYZdata.co>_rep_pcode for key: D:01
(see tombstone_warn_threshold). 5000 columns were requested, slices=[9:201001-9:201412:!]

2) Warnings related to partition size, which is probably related to my problem.

WARN  [CompactionExecutor:1600] 2015-08-19 03:18:00,869 SSTableWriter.java:240 - Compacting
large partition XYZdata/co_rep_pcode:D:820231 (117964258 bytes)

———


This has a very bad smell even though my system **seems** to work well. I’m quite lost and
I will most probably reload all my data in both clusters.

Again, Alain, thanks for your help.

Kind regards

Jean






Anyway, see if you can give us more info related to this.

C*heers,

Alain



2015-08-18 14:40 GMT+02:00 Jean Tremblay <jean.tremblay@zen-innovations.com<mailto:jean.tremblay@zen-innovations.com>>:
No. I did not try.
I would like to understand what is going on before I make my problem, maybe even worse.

I really would like to understand:

1) Is this normal?
2) What is the meaning of the column Load?
3) Is there anything to fix? Can I leave it like that?
      4) Did I do something wrong? When you use -par you only need to run repair from one
node right? E.g.  nodetool -h 192.168.2.100 repair -pr -par -inc

Thanks for your feedback.

Jean

On 18 Aug 2015, at 14:33 , Mark Greene <greenemj@gmail.com<mailto:greenemj@gmail.com>>
wrote:

Hey Jean,

Did you try running a nodetool cleanup on all your nodes, perhaps one at a time?

On Tue, Aug 18, 2015 at 3:59 AM, Jean Tremblay <jean.tremblay@zen-innovations.com<mailto:jean.tremblay@zen-innovations.com>>
wrote:
Hi,

I have a phenomena I cannot explain, and I would like to understand.

I’m running Cassandra 2.1.8 on a cluster of 5 nodes.
I’m using replication factor 3, with most default settings.

Last week I done a nodetool status which gave me on each node a load of about 200 GB.
Since then there was no deletes no inserts.

This weekend I did a nodetool -h 192.168.2.100 repair -pr -par -inc

And now when I make a nodetool status I see completely a new picture!!

nodetool -h zennode0 status
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address        Load       Tokens  Owns    Host ID                               Rack
UN  192.168.2.104  940.73 GB  256     ?       c13e0858-091c-47c4-8773-6d6262723435  rack1
UN  192.168.2.100  1.07 TB    256     ?       c32a9357-e37e-452e-8eb1-57d86314b419  rack1
UN  192.168.2.101  189.03 GB  256     ?       9af90dea-90b3-4a8a-b88a-0aeabe3cea79  rack1
UN  192.168.2.102  951.28 GB  256     ?       8eb7a5bb-6903-4ae1-a372-5436d0cc170c  rack1
UN  192.168.2.103  196.54 GB  256     ?       9efc6f13-2b02-4400-8cde-ae831feb86e9  rack1

The nodes 192.168.2.101 and 103 are about what they were last week, but now the three other
nodes have a load which is about 5 times bigger!

1) Is this normal?
2) What is the meaning of the column Load?
3) Is there anything to fix? Can I leave it like that?

Strange I’m asking to fix after I did a *repair*.

Thanks a lot for your help.

Kind regards

Jean




Mime
View raw message