incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edward Capriolo <edlinuxg...@gmail.com>
Subject Re: Column Slice Query performance after deletions
Date Sat, 02 Mar 2013 19:05:39 GMT
Casandra's data files are write once. Deletes are another write. Until
compaction they all live on disk.Making really big rows has these problem.

On Sat, Mar 2, 2013 at 1:42 PM, Michael Kjellman <mkjellman@barracuda.com>wrote:

> What is your gc_grace set to? Sounds like as the number of tombstones
> records increase your performance decreases. (Which I would expect)
>
> On Mar 2, 2013, at 10:28 AM, "Víctor Hugo Oliveira Molinar" <
> vhmolinar@gmail.com> wrote:
>
> I have a daily maintenance of my cluster where I truncate this column
> family. Because its data doesnt need to be kept more than a day.
> Since all the regular operations on it finishes around 4 hours before
> finishing the day. I regurlarly run a truncate on it followed by a repair
> at the end of the day.
>
> And every day, when the operations are started(when are only few deleted
> columns), the performance looks pretty well.
> Unfortunately it is degraded along the day.
>
>
> On Sat, Mar 2, 2013 at 2:54 PM, Michael Kjellman <mkjellman@barracuda.com>wrote:
>
>> When is the last time you did a cleanup on the cf?
>>
>> On Mar 2, 2013, at 9:48 AM, "Víctor Hugo Oliveira Molinar" <
>> vhmolinar@gmail.com> wrote:
>>
>> > Hello guys.
>> > I'm investigating the reasons of performance degradation for my case
>> scenario which follows:
>> >
>> > - I do have a column family which is filled of thousands of columns
>> inside a unique row(varies between 10k ~ 200k). And I do have also
>> thousands of rows, not much more than 15k.
>> > - This rows are constantly updated. But the write-load is not that
>> intensive. I estimate it as 100w/sec in the column family.
>> > - Each column represents a message which is read and processed by
>> another process. After reading it, the column is marked for deletion in
>> order to keep it out from the next query on this row.
>> >
>> > Ok, so, I've been figured out that after many insertions plus deletion
>> updates, my queries( column slice query ) are taking more time to be
>> performed. Even if there are only few columns, lower than 100.
>> >
>> > So it looks like that the longer is the number of columns being
>> deleted, the longer is the time spent for a query.
>> > -> Internally at C*, does column slice query ranges among deleted
>> columns?
>> > If so, how can I mitigate the impact in my queries? Or, how can I avoid
>> those deleted columns?
>>
>> Copy, by Barracuda, helps you store, protect, and share all your amazing
>> things. Start today: www.copy.com.
>>
>
>
> ----------------------------------
> Copy, by Barracuda, helps you store, protect, and share all your amazing
> things. Start today: www.copy.com <http://www.copy.com?a=em_footer>.
>   ­­
>

Mime
View raw message