cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Jorgensen (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (CASSANDRA-11842) Unbounded commit log file growth
Date Wed, 18 May 2016 21:54:13 GMT

     [ https://issues.apache.org/jira/browse/CASSANDRA-11842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Andrew Jorgensen updated CASSANDRA-11842:
-----------------------------------------
    Description: 
Today I noticed that 2 nodes in a 54 node cluster have been using up disk space at a constant
rate for the last 3 days or so. 

This is a graph of disk space over the last 4 days for each of the nodes in our cassandra
cluster.
!disks-space.png|thumnnail!

When I looked into it I found that the majority of the disk space was being used up in /mnt/cassandra/commitlog.
It looked like there were files dating back to when the disk usage started to increase on
5/16 and there were a total of ~13K commit log files in this directory.

I was curious if anyone has seen this before. I am not sure what would cause this behavior,
especially on two separate nodes in the cluster at about the same time. I think this points
to something about the data, we have a replication factor of 2 which seems to match up with
the number of nodes that were affected.

The two nodes in question looked down from every other node in the clusters perspective when
doing `nodetool` status but when running that on the affected nodes the entire cluster looked
 like it was up and running.

To remedy the situation I tried running `nodetool drain` on one of the affected nodes but
it seemed to be hung and I couldnt get a handle on if it was doing anything or not. I restarted
the cassandra process and could see in the debug log that it was reading in the commit log
files. After it finished reading all of the commit log files it correctly removed the files
and restored the disk space. On the second node I moved the commit log folder to a different
location and restarted the node which cause it to immediately rejoin the cluster and I can
go re-play the commit log files that were queued up later to make sure its in a consistent
state. So far it looks like the commit log file growth on that node is not growing unboundedly.

As far as I could tell the data in /mnt/cassandra/data/ for each of the keyspaces and tables
had recent timestamps on the file which I believe means that flushing was happening and data
was getting written to the SStables, also 350GB of commitlog wouldnt have been able to fit
into memory.

In terms of settings we do not have `commitlog_total_space_in_mb` set so it should be whatever
the default is. We do have the `commitlog_segment_size_in_mb` set to 32.

If there is any other information I can provide please let me know. I didnt see much in the
cassandra system.log or debug.log file but would be happy to provide them if it'll help.

  was:
Today I noticed that 2 nodes in a 54 node cluster have been using up disk space at a constant
rate for the last 3 days or so. 

This is a graph of disk space over the last 4 days for each of the nodes in our cassandra
cluster.
!disks-space.png|thumnnail!

When I looked into it I found that the majority of the disk space was being used up in /mnt/cassandra/commitlog.
It looked like there were files dating back to when the disk usage started to increase on
5/16 and there were a total of ~13K commit log files in this directory.

I was curious if anyone has seen this before. I am not sure what would cause this behavior,
especially on two separate nodes in the cluster at about the same time. I think this points
to something about the data, we have a replication factor of 2 which seems to match up with
the number of nodes that were affected.

The two nodes in question looked down from every other node in the clusters perspective when
doing `nodetool` status but when running that on the affected nodes the entire cluster looked
 like it was up and running.

To remedy the situation I tried running `nodetool drain` on one of the affected nodes but
it seemed to be hung and I couldnt get a handle on if it was doing anything or not. I restarted
the cassandra process and could see in the debug log that it was reading in the commit log
files. On the second node I moved the commit log folder to a different location and restarted
the node which cause it to immediately rejoin the cluster and I can go re-play the commit
log files that were queued up later to make sure its in a consistent state. So far it looks
like the commit log file growth on that node is not growing unboundedly.

As far as I could tell the data in /mnt/cassandra/data/ for each of the keyspaces and tables
had recent timestamps on the file which I believe means that flushing was happening and data
was getting written to the SStables, also 350GB of commitlog wouldnt have been able to fit
into memory.

If there is any other information I can provide please let me know. I didnt see much in the
cassandra system.log or debug.log file but would be happy to provide them if it'll help.


> Unbounded commit log file growth
> --------------------------------
>
>                 Key: CASSANDRA-11842
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11842
>             Project: Cassandra
>          Issue Type: Bug
>         Environment: Cassandra version 3.0.3 on Ubuntu Trusty
>            Reporter: Andrew Jorgensen
>         Attachments: disks-space.png
>
>
> Today I noticed that 2 nodes in a 54 node cluster have been using up disk space at a
constant rate for the last 3 days or so. 
> This is a graph of disk space over the last 4 days for each of the nodes in our cassandra
cluster.
> !disks-space.png|thumnnail!
> When I looked into it I found that the majority of the disk space was being used up in
/mnt/cassandra/commitlog. It looked like there were files dating back to when the disk usage
started to increase on 5/16 and there were a total of ~13K commit log files in this directory.
> I was curious if anyone has seen this before. I am not sure what would cause this behavior,
especially on two separate nodes in the cluster at about the same time. I think this points
to something about the data, we have a replication factor of 2 which seems to match up with
the number of nodes that were affected.
> The two nodes in question looked down from every other node in the clusters perspective
when doing `nodetool` status but when running that on the affected nodes the entire cluster
looked  like it was up and running.
> To remedy the situation I tried running `nodetool drain` on one of the affected nodes
but it seemed to be hung and I couldnt get a handle on if it was doing anything or not. I
restarted the cassandra process and could see in the debug log that it was reading in the
commit log files. After it finished reading all of the commit log files it correctly removed
the files and restored the disk space. On the second node I moved the commit log folder to
a different location and restarted the node which cause it to immediately rejoin the cluster
and I can go re-play the commit log files that were queued up later to make sure its in a
consistent state. So far it looks like the commit log file growth on that node is not growing
unboundedly.
> As far as I could tell the data in /mnt/cassandra/data/ for each of the keyspaces and
tables had recent timestamps on the file which I believe means that flushing was happening
and data was getting written to the SStables, also 350GB of commitlog wouldnt have been able
to fit into memory.
> In terms of settings we do not have `commitlog_total_space_in_mb` set so it should be
whatever the default is. We do have the `commitlog_segment_size_in_mb` set to 32.
> If there is any other information I can provide please let me know. I didnt see much
in the cassandra system.log or debug.log file but would be happy to provide them if it'll
help.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message