cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Narendra Sharma <narendra.sha...@gmail.com>
Subject Re: Cassandra 1.1.6 - Disk usage and Load displayed in ring doesn't match
Date Wed, 18 Dec 2013 19:15:19 GMT
Thanks Julien. We ran repair. Increasing the RF should not make sstables
obselete. I can understand reducing RF or adding new node etc can result in
few obsolete sstables which eventually go away after you run cleanup.


On Wed, Dec 18, 2013 at 1:49 AM, Julien Campan <julien.campan@gmail.com>wrote:

> Hi,
> When you are increasing the RF, you need to perform repair for the
> keyspace on each node.(Because datas are not automaticaly streamed).
> After that you should perform a cleanup on each node to remove obsolete
> sstable.
>
>
> Good luck :)
>
> Julien Campan.
>
>
>
>
>
>
>
>
>
> 2013/12/18 Aaron Morton <aaron@thelastpickle.com>
>
>> -tmp- files will sit in the data dir, if there was an error creating them
>> during compaction or flushing to disk they will sit around until a restart.
>>
>> Check the logs for errors to see if compaction was failing on something.
>>
>> Cheers
>>
>>  -----------------
>> Aaron Morton
>> New Zealand
>> @aaronmorton
>>
>> Co-Founder & Principal Consultant
>> Apache Cassandra Consulting
>> http://www.thelastpickle.com
>>
>> On 17/12/2013, at 12:28 pm, Narendra Sharma <narendra.sharma@gmail.com>
>> wrote:
>>
>> No snapshots.
>>
>> I restarted the node and now the Load in ring is in sync with the disk
>> usage. Not sure what caused it to go out of sync. However, the Live SStable
>> count doesn't match exactly with the number of data files on disk.
>>
>> I am going through the Cassandra code to understand what could be the
>> reason for the mismatch in the sstable count and also why there is no
>> reference of some of the data files in system.log.
>>
>>
>>
>>
>> On Mon, Dec 16, 2013 at 2:45 PM, Arindam Barua <abarua@247-inc.com>wrote:
>>
>>>
>>>
>>> Do you have any snapshots on the nodes where you are seeing this issue?
>>>
>>> Snapshots will link to sstables which will cause them not be deleted.
>>>
>>>
>>>
>>> -Arindam
>>>
>>>
>>>
>>> *From:* Narendra Sharma [mailto:narendra.sharma@gmail.com]
>>> *Sent:* Sunday, December 15, 2013 1:15 PM
>>> *To:* user@cassandra.apache.org
>>> *Subject:* Cassandra 1.1.6 - Disk usage and Load displayed in ring
>>> doesn't match
>>>
>>>
>>>
>>> We have 8 node cluster. Replication factor is 3.
>>>
>>>
>>>
>>> For some of the nodes the Disk usage (du -ksh .) in the data directory
>>> for CF doesn't match the Load reported in nodetool ring command. When we
>>> expanded the cluster from 4 node to 8 nodes (4 weeks back), everything was
>>> okay. Over period of last 2-3 weeks the disk usage has gone up. We
>>> increased the RF from 2 to 3 2 weeks ago.
>>>
>>>
>>>
>>> I am not sure if increasing the RF is causing this issue.
>>>
>>>
>>>
>>> For one of the nodes that I analyzed:
>>>
>>> 1. nodetool ring reported load as 575.38 GB
>>>
>>>
>>>
>>> 2. nodetool cfstats for the CF reported:
>>>
>>> SSTable count: 28
>>>
>>> Space used (live): 572671381955
>>>
>>> Space used (total): 572671381955
>>>
>>>
>>>
>>>
>>>
>>> 3. 'ls -1 *Data* | wc -l' in the data folder for CF returned
>>>
>>> 46
>>>
>>>
>>>
>>> 4. 'du -ksh .' in the data folder for CF returned
>>>
>>> 720G
>>>
>>>
>>>
>>> The above numbers indicate that there are some sstables that are
>>> obsolete and are still occupying space on disk. What could be wrong? Will
>>> restarting the node help? The cassandra process is running for last 45 days
>>> with no downtime. However, because the disk usage is high, we are not able
>>> to run full compaction.
>>>
>>>
>>>
>>> Also, I can't find reference to each of the sstables on disk in the
>>> system.log file. For eg I have one data file on disk as (ls -lth):
>>>
>>> 86G Nov 20 06:14
>>>
>>>
>>>
>>> I have system.log file with first line:
>>>
>>> INFO [main] 2013-11-18 09:41:56,120 AbstractCassandraDaemon.java (line
>>> 101) Logging initialized
>>>
>>>
>>>
>>> The 86G file must be a result of some compaction. I see no reference of
>>> data file in system.log file between 11/18 to 11/25. What could be the
>>> reason for that? The only reference is dated 11/29 when the file was being
>>> streamed to another node (new node).
>>>
>>>
>>>
>>> How can I identify the obsolete files and remove them? I am thinking
>>> about following. Let me know if it make sense.
>>>
>>> 1. Restart the node and check the state.
>>>
>>> 2. Move the oldest data files to another location (to another mount
>>> point)
>>>
>>> 3. Restart the node again
>>>
>>> 4. Run repair on the node so that it can get the missing data from its
>>> peers.
>>>
>>>
>>>
>>>
>>>
>>> I compared the numbers of a healthy node for the same CF:
>>>
>>> 1. nodetool ring reported load as 662.95 GB
>>>
>>>
>>>
>>> 2. nodetool cfstats for the CF reported:
>>>
>>> SSTable count: 16
>>>
>>> Space used (live): 670524321067
>>>
>>> Space used (total): 670524321067
>>>
>>>
>>>
>>> 3. 'ls -1 *Data* | wc -l' in the data folder for CF returned
>>>
>>> 16
>>>
>>>
>>>
>>> 4. 'du -ksh .' in the data folder for CF returned
>>>
>>> 625G
>>>
>>>
>>>
>>>
>>>
>>> -Naren
>>>
>>>
>>>
>>>
>>>
>>>
>>> --
>>> Narendra Sharma
>>>
>>> Software Engineer
>>>
>>> *http://www.aeris.com <http://www.aeris.com/>*
>>>
>>> *http://narendrasharma.blogspot.com/
>>> <http://narendrasharma.blogspot.com/>*
>>>
>>>
>>>
>>
>>
>>
>> --
>> Narendra Sharma
>> Software Engineer
>> *http://www.aeris.com <http://www.aeris.com/>*
>> *http://narendrasharma.blogspot.com/
>> <http://narendrasharma.blogspot.com/>*
>>
>>
>>
>


-- 
Narendra Sharma
Software Engineer
*http://www.aeris.com <http://www.aeris.com>*
*http://narendrasharma.blogspot.com/ <http://narendrasharma.blogspot.com/>*

Mime
View raw message