incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <aa...@thelastpickle.com>
Subject Re: Column Slice Query performance after deletions
Date Mon, 04 Mar 2013 06:25:24 GMT
> I need something to keep the deleted columns away from my query fetch. Not only the tombstones.
> It looks like the min compaction might help on this. But I'm not sure yet on what would
be a reasonable value for its threeshold.
Your tombstones will not be purged in a compaction until after gc_grace and only if all fragments
of the row are in the compaction. You right that you would probably want to run repair during
the day if you are going to dramatically reduce gc_grace to avoid deleted data coming back
to life. 

If you are using a single cassandra row as a queue, you are going to have trouble. Levelled
compaction may help a little. 

If you are reading the "most recent" entries in the row, assuming the columns are sorted by
some time stamp. Use the Reverse Comparator and issue slice commands to get the first X cols.
That will remove tombstones from the problem. (Am guessing this is not something you do, just
mentioning it). 

You next option is to change the data model so you don't use the same row all day. 

After that, consider a message queue. 

Cheers


-----------------
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 2/03/2013, at 12:03 PM, Víctor Hugo Oliveira Molinar <vhmolinar@gmail.com> wrote:

> Tombstones stay around until gc grace so you could lower that to see of that fixes the
performance issues.
> 
> If the tombstones get collected,the column will live again, causing data inconsistency
since I cant run a repair during the regular operations. Not sure if I got your thoughts on
this.
> 
> 
> Size tiered or leveled comparison?
> 
> I'm actuallly running on Size Tiered Compaction, but I've been looking into changing
it for Leveled. It seems to be the case.  Although even if I achieve some performance, I would
still have the same problem with the deleted columns.
> 
> 
> I need something to keep the deleted columns away from my query fetch. Not only the tombstones.
> It looks like the min compaction might help on this. But I'm not sure yet on what would
be a reasonable value for its threeshold.
> 
> 
> On Sat, Mar 2, 2013 at 4:22 PM, Michael Kjellman <mkjellman@barracuda.com> wrote:
> Tombstones stay around until gc grace so you could lower that to see of that fixes the
performance issues.
> 
> Size tiered or leveled comparison?
> 
> On Mar 2, 2013, at 11:15 AM, "Víctor Hugo Oliveira Molinar" <vhmolinar@gmail.com>
wrote:
> 
>> What is your gc_grace set to? Sounds like as the number of tombstones records increase
your performance decreases. (Which I would expect)
>> 
>> gr_grace is default.
>> 
>> 
>> Casandra's data files are write once. Deletes are another write. Until compaction
they all live on disk.Making really big rows has these problem.
>> Oh, so it looks like I should lower the min_compaction_threshold for this column
family. Right?
>> What does realy mean this threeshold value?
>> 
>> 
>> Guys, thanks for the help so far.
>> 
>> On Sat, Mar 2, 2013 at 3:42 PM, Michael Kjellman <mkjellman@barracuda.com>
wrote:
>> What is your gc_grace set to? Sounds like as the number of tombstones records increase
your performance decreases. (Which I would expect)
>> 
>> On Mar 2, 2013, at 10:28 AM, "Víctor Hugo Oliveira Molinar" <vhmolinar@gmail.com>
wrote:
>> 
>>> I have a daily maintenance of my cluster where I truncate this column family.
Because its data doesnt need to be kept more than a day. 
>>> Since all the regular operations on it finishes around 4 hours before finishing
the day. I regurlarly run a truncate on it followed by a repair at the end of the day.
>>> 
>>> And every day, when the operations are started(when are only few deleted columns),
the performance looks pretty well.
>>> Unfortunately it is degraded along the day.
>>> 
>>> 
>>> On Sat, Mar 2, 2013 at 2:54 PM, Michael Kjellman <mkjellman@barracuda.com>
wrote:
>>> When is the last time you did a cleanup on the cf?
>>> 
>>> On Mar 2, 2013, at 9:48 AM, "Víctor Hugo Oliveira Molinar" <vhmolinar@gmail.com>
wrote:
>>> 
>>> > Hello guys.
>>> > I'm investigating the reasons of performance degradation for my case scenario
which follows:
>>> >
>>> > - I do have a column family which is filled of thousands of columns inside
a unique row(varies between 10k ~ 200k). And I do have also thousands of rows, not much more
than 15k.
>>> > - This rows are constantly updated. But the write-load is not that intensive.
I estimate it as 100w/sec in the column family.
>>> > - Each column represents a message which is read and processed by another
process. After reading it, the column is marked for deletion in order to keep it out from
the next query on this row.
>>> >
>>> > Ok, so, I've been figured out that after many insertions plus deletion updates,
my queries( column slice query ) are taking more time to be performed. Even if there are only
few columns, lower than 100.
>>> >
>>> > So it looks like that the longer is the number of columns being deleted,
the longer is the time spent for a query.
>>> > -> Internally at C*, does column slice query ranges among deleted columns?
>>> > If so, how can I mitigate the impact in my queries? Or, how can I avoid
those deleted columns?
>>> 
>>> Copy, by Barracuda, helps you store, protect, and share all your amazing
>>> things. Start today: www.copy.com.
>>> 
>> 
>> ---------------------------------- 
>> Copy, by Barracuda, helps you store, protect, and share all your amazing things.
Start today: www.copy.com.
>>   ­­  
>> 
> 
> ---------------------------------- 
> Copy, by Barracuda, helps you store, protect, and share all your amazing things. Start
today: www.copy.com.
>   ­­  
> 


Mime
View raw message