FYI: Repair does not remove tombstones. To remove tombstones you need to run compaction.
If you have a lot of data then make sure you run compaction on all nodes before running repair. We had a big trouble with our system regarding tombstone and it took us long time to figure out the reason. It turned out that repair process also transfers TTLed data (compaction is not triggered yet) to the other nodes even that data was removed from the other nodes in the compaction phase before that.

-Binh

On Fri, Nov 9, 2012 at 1:34 PM, André Cruz <andre.cruz@co.sapo.pt> wrote:
That must be it. I dumped the sstables to json and there are lots of records, including ones that are returned to my application, that have the deletedAt attribute. I think this is because the regular repair job was not running for some time, surely more than the grace period, and lots of tombstones stayed behind even though we are running repair regularly now.

Thanks!
André

On Nov 8, 2012, at 10:51 PM, Josep Blanquer <blanquer@rightscale.com> wrote:

Can it be that you have tons and tons of tombstoned columns in the middle of these two? I've seen plenty of performance issues with wide rows littered with column tombstones (you could check with dumping the sstables...)

Just a thought...

Josep M.

On Thu, Nov 8, 2012 at 12:23 PM, André Cruz <andre.cruz@co.sapo.pt> wrote:
These are the two columns in question:

=> (super_column=13957152-234b-11e2-92bc-e0db550199f4,
     (column=attributes, value=, timestamp=1351681613263657)
     (column=blocks, value=A4edo5MhHvojv3Ihx_JkFMsF3ypthtBvAZkoRHsjulw06pez86OHch3K3OpmISnDjHODPoCf69bKcuAZSJj-4Q, timestamp=1351681613263657)
     (column=hash, value=8_p2QaeRaX_QwJbUWQ07ZqlNHei7ixu0MHxgu9oennfYOGfyH6EsEe_LYO8V8EC_1NPL44Gx8B7UhYV9VSb7Lg, timestamp=1351681613263657)
     (column=icon, value=image_jpg, timestamp=1351681613263657)
     (column=is_deleted, value=true, timestamp=1351681613263657)
     (column=is_dir, value=false, timestamp=1351681613263657)
     (column=mime_type, value=image/jpeg, timestamp=1351681613263657)
     (column=mtime, value=1351646803, timestamp=1351681613263657)
     (column=name, value=/Mobile Photos/Photo 2012-10-28 17_13_50.jpeg, timestamp=1351681613263657)
     (column=revision, value=13957152-234b-11e2-92bc-e0db550199f4, timestamp=1351681613263657)
     (column=size, value=1379001, timestamp=1351681613263657)
     (column=thumb_exists, value=true, timestamp=1351681613263657))
=> (super_column=40b7ae4e-2449-11e2-8610-e0db550199f4,
     (column=attributes, value={"posix": 420}, timestamp=1351790781154800)
     (column=blocks, value=9UCDkHNb8-8LuKr2bv9PjKcWCT0v7FCZa0ebNSflES4-o7QD6eYschVaweCKSbR29Dq2IeGl_Cu7BVnYJYphTQ, timestamp=1351790781154800)
     (column=hash, value=kao2EV8jw_wN4EBoMkCXZWCwg3qQ0X6m9_X9JIGkEkiGKJE_JeKgkdoTAkAefXgGtyhChuhWPlWMxl_tX7VZUw, timestamp=1351790781154800)
     (column=icon, value=text_txt, timestamp=1351790781154800)
     (column=is_dir, value=false, timestamp=1351790781154800)
     (column=mime_type, value=text/plain, timestamp=1351790781154800)
     (column=mtime, value=1351378576, timestamp=1351790781154800)
     (column=name, value=/Documents/VIMDocument.txt, timestamp=1351790781154800)
     (column=revision, value=40b7ae4e-2449-11e2-8610-e0db550199f4, timestamp=1351790781154800)
     (column=size, value=13, timestamp=1351790781154800)
     (column=thumb_exists, value=false, timestamp=1351790781154800))


I don't think their size is an issue here.

André

On Nov 8, 2012, at 6:04 PM, Andrey Ilinykh <ailinykh@gmail.com> wrote:

What is the size of columns? Probably those two are huge.


On Thu, Nov 8, 2012 at 4:01 AM, André Cruz <andre.cruz@co.sapo.pt> wrote:
On Nov 7, 2012, at 12:15 PM, André Cruz <andre.cruz@co.sapo.pt> wrote:

> This error also happens on my application that uses pycassa, so I don't think this is the same bug.

I have narrowed it down to a slice between two consecutive columns. Observe this behaviour using pycassa:

>>> DISCO_CASS.col_fam_nsrev.get(uuid.UUID('3cd88d97-ffde-44ca-8ae9-5336caaebc4e'), column_count=2, column_start=uuid.UUID('13957152-234b-11e2-92bc-e0db550199f4')).keys()
DEBUG 2012-11-08 11:55:51,170 pycassa_library.pool:30 6849 139928791262976 Connection 52905488 (xxx:9160) was checked out from pool 51715344
DEBUG 2012-11-08 11:55:53,415 pycassa_library.pool:37 6849 139928791262976 Connection 52905488 (xxx:9160) was checked in to pool 51715344
[UUID('13957152-234b-11e2-92bc-e0db550199f4'), UUID('40b7ae4e-2449-11e2-8610-e0db550199f4')]

A two column slice took more than 2s to return. If I request the next 2 column slice:

>>> DISCO_CASS.col_fam_nsrev.get(uuid.UUID('3cd88d97-ffde-44ca-8ae9-5336caaebc4e'), column_count=2, column_start=uuid.UUID('40b7ae4e-2449-11e2-8610-e0db550199f4')).keys()
DEBUG 2012-11-08 11:57:32,750 pycassa_library.pool:30 6849 139928791262976 Connection 52904912 (xxx:9160) was checked out from pool 51715344
DEBUG 2012-11-08 11:57:32,774 pycassa_library.pool:37 6849 139928791262976 Connection 52904912 (xxx:9160) was checked in to pool 51715344
[UUID('40b7ae4e-2449-11e2-8610-e0db550199f4'), UUID('a364b028-2449-11e2-8882-e0db550199f4')]

This takes 20msec... Is there a rational explanation for this different behaviour? Is there some threshold that I'm running into? Is there any way to obtain more debugging information about this problem?

Thanks,
André