you might also want to try to see if it's due to disk seeking.
There might be some tuning you can do--key cache, etc--though I can't speak to that in your particular case and with 50 column families you'd probably run into pretty bad memory limits.However, having found myself in a similar situation in the past, you might consider experimentally trying different batch sizes on the # of rows (eg 1 request for 900 vs 9 for 100 each, etc). This has helped me solve timeout problems when retrieving "large" numbers of rows in the past and reduced overall retrieval time. I know that at least the pycassa client supports this type of multiget out of the box.On Wed, Aug 31, 2011 at 5:13 AM, Renato Bacelar da Silveira <email@example.com> wrote:
I am running a query against a node with about 50 Column Families.
At present One of the column families has 2,502,000 rows, each row
contains 100 columns.
I am searching for 3 columns specifically, and am doing so with Thrift's
multiget_slice(). I prepare a statement with about 900 row keys, each
searching for a slice of 3 specific columns.
My average time taken to return from the multiget_slice() is about 4
seconds. I performed a comparative query in mysql, and the results
were returned to me in 0.75 seconds or avarage.
Is 4 seconds way too much time for Cassandra? I am sure this could
be under 1 second, like MySql.
I have resized the Thrift transport size to just 1MB so to not encounter
any timeouts, as noted if you push too many queries through. Is this
a correct assumption?
So is it too much to push 900 keys in a multiget_slice() at once? I read
that it does a concurrent fetch. I can understand threads racing for
cycles, causing waits, but somehow I think I am wrong somewhere.
Regards to ALL!
Renato da Silveira