cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Narendra Sharma <narendra.sha...@gmail.com>
Subject Re: Cassandra 1.1.6 - Disk usage and Load displayed in ring doesn't match
Date Wed, 18 Dec 2013 19:12:40 GMT
Thanks Aaron. No tmp files and not even a single exception in the
system.log.

If the file was last modified on 20-Nov then there must be an entry for
that in the log (either completed streaming or compacted).


On Tue, Dec 17, 2013 at 7:23 PM, Aaron Morton <aaron@thelastpickle.com>wrote:

> -tmp- files will sit in the data dir, if there was an error creating them
> during compaction or flushing to disk they will sit around until a restart.
>
> Check the logs for errors to see if compaction was failing on something.
>
> Cheers
>
> -----------------
> Aaron Morton
> New Zealand
> @aaronmorton
>
> Co-Founder & Principal Consultant
> Apache Cassandra Consulting
> http://www.thelastpickle.com
>
> On 17/12/2013, at 12:28 pm, Narendra Sharma <narendra.sharma@gmail.com>
> wrote:
>
> No snapshots.
>
> I restarted the node and now the Load in ring is in sync with the disk
> usage. Not sure what caused it to go out of sync. However, the Live SStable
> count doesn't match exactly with the number of data files on disk.
>
> I am going through the Cassandra code to understand what could be the
> reason for the mismatch in the sstable count and also why there is no
> reference of some of the data files in system.log.
>
>
>
>
> On Mon, Dec 16, 2013 at 2:45 PM, Arindam Barua <abarua@247-inc.com> wrote:
>
>>
>>
>> Do you have any snapshots on the nodes where you are seeing this issue?
>>
>> Snapshots will link to sstables which will cause them not be deleted.
>>
>>
>>
>> -Arindam
>>
>>
>>
>> *From:* Narendra Sharma [mailto:narendra.sharma@gmail.com]
>> *Sent:* Sunday, December 15, 2013 1:15 PM
>> *To:* user@cassandra.apache.org
>> *Subject:* Cassandra 1.1.6 - Disk usage and Load displayed in ring
>> doesn't match
>>
>>
>>
>> We have 8 node cluster. Replication factor is 3.
>>
>>
>>
>> For some of the nodes the Disk usage (du -ksh .) in the data directory
>> for CF doesn't match the Load reported in nodetool ring command. When we
>> expanded the cluster from 4 node to 8 nodes (4 weeks back), everything was
>> okay. Over period of last 2-3 weeks the disk usage has gone up. We
>> increased the RF from 2 to 3 2 weeks ago.
>>
>>
>>
>> I am not sure if increasing the RF is causing this issue.
>>
>>
>>
>> For one of the nodes that I analyzed:
>>
>> 1. nodetool ring reported load as 575.38 GB
>>
>>
>>
>> 2. nodetool cfstats for the CF reported:
>>
>> SSTable count: 28
>>
>> Space used (live): 572671381955
>>
>> Space used (total): 572671381955
>>
>>
>>
>>
>>
>> 3. 'ls -1 *Data* | wc -l' in the data folder for CF returned
>>
>> 46
>>
>>
>>
>> 4. 'du -ksh .' in the data folder for CF returned
>>
>> 720G
>>
>>
>>
>> The above numbers indicate that there are some sstables that are obsolete
>> and are still occupying space on disk. What could be wrong? Will restarting
>> the node help? The cassandra process is running for last 45 days with no
>> downtime. However, because the disk usage is high, we are not able to run
>> full compaction.
>>
>>
>>
>> Also, I can't find reference to each of the sstables on disk in the
>> system.log file. For eg I have one data file on disk as (ls -lth):
>>
>> 86G Nov 20 06:14
>>
>>
>>
>> I have system.log file with first line:
>>
>> INFO [main] 2013-11-18 09:41:56,120 AbstractCassandraDaemon.java (line
>> 101) Logging initialized
>>
>>
>>
>> The 86G file must be a result of some compaction. I see no reference of
>> data file in system.log file between 11/18 to 11/25. What could be the
>> reason for that? The only reference is dated 11/29 when the file was being
>> streamed to another node (new node).
>>
>>
>>
>> How can I identify the obsolete files and remove them? I am thinking
>> about following. Let me know if it make sense.
>>
>> 1. Restart the node and check the state.
>>
>> 2. Move the oldest data files to another location (to another mount point)
>>
>> 3. Restart the node again
>>
>> 4. Run repair on the node so that it can get the missing data from its
>> peers.
>>
>>
>>
>>
>>
>> I compared the numbers of a healthy node for the same CF:
>>
>> 1. nodetool ring reported load as 662.95 GB
>>
>>
>>
>> 2. nodetool cfstats for the CF reported:
>>
>> SSTable count: 16
>>
>> Space used (live): 670524321067
>>
>> Space used (total): 670524321067
>>
>>
>>
>> 3. 'ls -1 *Data* | wc -l' in the data folder for CF returned
>>
>> 16
>>
>>
>>
>> 4. 'du -ksh .' in the data folder for CF returned
>>
>> 625G
>>
>>
>>
>>
>>
>> -Naren
>>
>>
>>
>>
>>
>>
>> --
>> Narendra Sharma
>>
>> Software Engineer
>>
>> *http://www.aeris.com <http://www.aeris.com/>*
>>
>> *http://narendrasharma.blogspot.com/
>> <http://narendrasharma.blogspot.com/>*
>>
>>
>>
>
>
>
> --
> Narendra Sharma
> Software Engineer
> *http://www.aeris.com <http://www.aeris.com/>*
> *http://narendrasharma.blogspot.com/ <http://narendrasharma.blogspot.com/>*
>
>
>


-- 
Narendra Sharma
Software Engineer
*http://www.aeris.com <http://www.aeris.com>*
*http://narendrasharma.blogspot.com/ <http://narendrasharma.blogspot.com/>*

Mime
View raw message