cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Ellis <jbel...@gmail.com>
Subject Re: get_slice slow
Date Wed, 25 Aug 2010 16:58:22 GMT
in many cases, especially "give me the first column", slicing is
faster -- lots of tombstones around is one case where it might not be.
 if you can reduce the tombstone volume, say by switching to a new row
every 5 minutes, that would help a lot.

On Wed, Aug 25, 2010 at 11:43 AM, B. Todd Burruss <bburruss@real.com> wrote:
> i did check sstables, and there are only three.  i haven't done any major
> compacts.
>
> do u think it is taking so long because it must sift thru the deleted
> columns before compaction?
>
> so accessing a column by name instead of slice predicate is faster?
>
>
> On 08/24/2010 11:23 PM, Benjamin Black wrote:
>>
>> Todd,
>>
>> This is a really bad idea.  What you are likely doing is spreading
>> that single row across a large number of sstables.  The more columns
>> you insert, the more sstables you are likely inspecting, the longer
>> the get_slice operations will take.  You can test whether this is so
>> by running nodetool compact when things start slowing down.  If it
>> speeds up, that is likely the problem.  If you are deleting that much,
>> you should also tune GCGraceSeconds way down (from the default of 10
>> days) so the space is reclaimed on major compaction and, again, there
>> are fewer things to inspect.
>>
>> Long rows written over long periods of time are almost certain to give
>> worse read performance, even far worse, than rows written all at once.
>>
>>
>> b
>>
>> On Tue, Aug 24, 2010 at 10:17 PM, B. Todd Burruss<bburruss@real.com>
>>  wrote:
>>
>>>
>>> thx artie,
>>>
>>> i haven't used a super CF because i thought it has more trouble doing
>>> slices
>>> because the entire row must be deserialized to get to the subcolumn you
>>> want?
>>>
>>> iostat is nothing, 0.0.  i have plenty of RAM and the OS is I/O caching
>>> nicely
>>>
>>> i haven't used the key cache, because i only have one key, the row of the
>>> queue ;)
>>>
>>> i haven't used row cache because i need the row to grow quite large,
>>> millions of columns.  and the size of data could be arbitrary - right now
>>> i
>>> am testing with<  32 byte values per column.
>>>
>>> i do need quorum consistency.
>>>
>>> i have read previous that some folks are using a single row with millions
>>> of
>>> columns.  is anyone using get_slice to pick off the first or the last
>>> column
>>> in the row?
>>>
>>> On 08/24/2010 09:25 PM, Artie Copeland wrote:
>>>
>>> Have you tried using a super column, it seems that having a row with over
>>> 100K columns and growing would be alot for cassandra to deserialize?
>>>  what
>>> is iostat and jmeter telling you? it would be interesting to see that
>>> data.
>>>  also what are you using for you key or row caching?  do you need to use
>>> a
>>> quorum consistency as that can slow down reads as well, can you use a
>>> lower
>>> consistency level?
>>>
>>> Artie
>>> On Tue, Aug 24, 2010 at 9:14 PM, B. Todd Burruss<bburruss@real.com>
>>>  wrote:
>>>
>>>>
>>>> i am using get_slice to pull columns from a row to emulate a queue.
>>>>  column names are TimeUUID and the values are small,<  32 bytes.  simple
>>>> ColumnFamily.
>>>>
>>>> i am using SlicePredicate like this to pull the first ("oldest") column
>>>> in
>>>> the row:
>>>>
>>>>        SlicePredicate predicate = new SlicePredicate();
>>>>        predicate.setSlice_range(new SliceRange(new byte[] {}, new byte[]
>>>> {}, false, 1));
>>>>
>>>>        get_slice(rowKey, colParent, predicate, QUORUM);
>>>>
>>>> once i get the column i remove it.  so there are a lot of gets and
>>>> mutates, leaving lots of deleted columns.
>>>>
>>>> get_slice starts off performing just fine, but then falls off
>>>> dramatically
>>>> as the number of columns grows.  at its peak there are 100,000 columns
>>>> and
>>>> get_slice is taking over 100ms to return.
>>>>
>>>> i am running a single instance of cassandra 0.7 on localhost, default
>>>> config.  i've done some googling and can't find any tweaks or tuning
>>>> suggestions specific to get_slice.  i already know about separating
>>>> commitlog and data, watching iostat, GC, etc.
>>>>
>>>> any low hanging tuning fruit anyone can think of?  in 0.6 i recall an
>>>> index for columns, maybe that is what i need?
>>>>
>>>> thx
>>>>
>>>
>>>
>>> --
>>> http://yeslinux.org
>>> http://yestech.org
>>>
>>>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com

Mime
View raw message