cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From André Cruz <andre.c...@co.sapo.pt>
Subject Re: Strange delay in query
Date Fri, 09 Nov 2012 21:34:50 GMT
That must be it. I dumped the sstables to json and there are lots of records, including ones
that are returned to my application, that have the deletedAt attribute. I think this is because
the regular repair job was not running for some time, surely more than the grace period, and
lots of tombstones stayed behind even though we are running repair regularly now.

Thanks!
André

On Nov 8, 2012, at 10:51 PM, Josep Blanquer <blanquer@rightscale.com> wrote:

> Can it be that you have tons and tons of tombstoned columns in the middle of these two?
I've seen plenty of performance issues with wide rows littered with column tombstones (you
could check with dumping the sstables...)
> 
> Just a thought...
> 
> Josep M.
> 
> On Thu, Nov 8, 2012 at 12:23 PM, André Cruz <andre.cruz@co.sapo.pt> wrote:
> These are the two columns in question:
> 
> => (super_column=13957152-234b-11e2-92bc-e0db550199f4,
>      (column=attributes, value=, timestamp=1351681613263657)
>      (column=blocks, value=A4edo5MhHvojv3Ihx_JkFMsF3ypthtBvAZkoRHsjulw06pez86OHch3K3OpmISnDjHODPoCf69bKcuAZSJj-4Q,
timestamp=1351681613263657)
>      (column=hash, value=8_p2QaeRaX_QwJbUWQ07ZqlNHei7ixu0MHxgu9oennfYOGfyH6EsEe_LYO8V8EC_1NPL44Gx8B7UhYV9VSb7Lg,
timestamp=1351681613263657)
>      (column=icon, value=image_jpg, timestamp=1351681613263657)
>      (column=is_deleted, value=true, timestamp=1351681613263657)
>      (column=is_dir, value=false, timestamp=1351681613263657)
>      (column=mime_type, value=image/jpeg, timestamp=1351681613263657)
>      (column=mtime, value=1351646803, timestamp=1351681613263657)
>      (column=name, value=/Mobile Photos/Photo 2012-10-28 17_13_50.jpeg, timestamp=1351681613263657)
>      (column=revision, value=13957152-234b-11e2-92bc-e0db550199f4, timestamp=1351681613263657)
>      (column=size, value=1379001, timestamp=1351681613263657)
>      (column=thumb_exists, value=true, timestamp=1351681613263657))
> => (super_column=40b7ae4e-2449-11e2-8610-e0db550199f4,
>      (column=attributes, value={"posix": 420}, timestamp=1351790781154800)
>      (column=blocks, value=9UCDkHNb8-8LuKr2bv9PjKcWCT0v7FCZa0ebNSflES4-o7QD6eYschVaweCKSbR29Dq2IeGl_Cu7BVnYJYphTQ,
timestamp=1351790781154800)
>      (column=hash, value=kao2EV8jw_wN4EBoMkCXZWCwg3qQ0X6m9_X9JIGkEkiGKJE_JeKgkdoTAkAefXgGtyhChuhWPlWMxl_tX7VZUw,
timestamp=1351790781154800)
>      (column=icon, value=text_txt, timestamp=1351790781154800)
>      (column=is_dir, value=false, timestamp=1351790781154800)
>      (column=mime_type, value=text/plain, timestamp=1351790781154800)
>      (column=mtime, value=1351378576, timestamp=1351790781154800)
>      (column=name, value=/Documents/VIMDocument.txt, timestamp=1351790781154800)
>      (column=revision, value=40b7ae4e-2449-11e2-8610-e0db550199f4, timestamp=1351790781154800)
>      (column=size, value=13, timestamp=1351790781154800)
>      (column=thumb_exists, value=false, timestamp=1351790781154800))
> 
> 
> I don't think their size is an issue here.
> 
> André
> 
> On Nov 8, 2012, at 6:04 PM, Andrey Ilinykh <ailinykh@gmail.com> wrote:
> 
>> What is the size of columns? Probably those two are huge.
>> 
>> 
>> On Thu, Nov 8, 2012 at 4:01 AM, André Cruz <andre.cruz@co.sapo.pt> wrote:
>> On Nov 7, 2012, at 12:15 PM, André Cruz <andre.cruz@co.sapo.pt> wrote:
>> 
>> > This error also happens on my application that uses pycassa, so I don't think
this is the same bug.
>> 
>> I have narrowed it down to a slice between two consecutive columns. Observe this
behaviour using pycassa:
>> 
>> >>> DISCO_CASS.col_fam_nsrev.get(uuid.UUID('3cd88d97-ffde-44ca-8ae9-5336caaebc4e'),
column_count=2, column_start=uuid.UUID('13957152-234b-11e2-92bc-e0db550199f4')).keys()
>> DEBUG 2012-11-08 11:55:51,170 pycassa_library.pool:30 6849 139928791262976 Connection
52905488 (xxx:9160) was checked out from pool 51715344
>> DEBUG 2012-11-08 11:55:53,415 pycassa_library.pool:37 6849 139928791262976 Connection
52905488 (xxx:9160) was checked in to pool 51715344
>> [UUID('13957152-234b-11e2-92bc-e0db550199f4'), UUID('40b7ae4e-2449-11e2-8610-e0db550199f4')]
>> 
>> A two column slice took more than 2s to return. If I request the next 2 column slice:
>> 
>> >>> DISCO_CASS.col_fam_nsrev.get(uuid.UUID('3cd88d97-ffde-44ca-8ae9-5336caaebc4e'),
column_count=2, column_start=uuid.UUID('40b7ae4e-2449-11e2-8610-e0db550199f4')).keys()
>> DEBUG 2012-11-08 11:57:32,750 pycassa_library.pool:30 6849 139928791262976 Connection
52904912 (xxx:9160) was checked out from pool 51715344
>> DEBUG 2012-11-08 11:57:32,774 pycassa_library.pool:37 6849 139928791262976 Connection
52904912 (xxx:9160) was checked in to pool 51715344
>> [UUID('40b7ae4e-2449-11e2-8610-e0db550199f4'), UUID('a364b028-2449-11e2-8882-e0db550199f4')]
>> 
>> This takes 20msec... Is there a rational explanation for this different behaviour?
Is there some threshold that I'm running into? Is there any way to obtain more debugging information
about this problem?
>> 
>> Thanks,
>> André
>> 
> 
> 


Mime
View raw message