cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Renato Bacelar da Silveira <renat...@indabamobile.co.za>
Subject Re: Cassandra Reads a little slow - 900 keys takes 4 seconds.
Date Thu, 01 Sep 2011 12:23:46 GMT
Thank you Yan and Dan

I have done all the tweaking that I can do,
even looked at latencies in the wire to add
to the 4 seconds, but that is the margin I
keep coming up with.

Concerning the memory issue mentioned, the machine
is working fine, there is alot of fluctuation on the Cassandra
heap itself, but no Out of Memory errors yet, even with
full blown query loads, like 200 threads issuing 10000 queries each,
so things seem to be stable.

But if one thread is issuing a single query, as mentioned bellow,
with 900 keys, on a CF with 2.5 million rows, the query takes
4 secs. A comparative query in MySql, with the similar data both
in the query string, and the tables, resulted in 0.75ms average
to the same machine in the cluster. So wire latency was not
the issue, and I think hardware also.

I will do some more tweaking and when the result time gets to something
comparative, then I will post my findings.

Regards to ALL!

On 01/09/2011 03:34, Yang wrote:
> you might also want to try to see if it's due to disk seeking.
>
> you verify this by increasing your memory/heap size, or writing your 
> files to a ram disk /tmpfs
>
>
>
> On Wed, Aug 31, 2011 at 4:57 PM, Dan Kuebrich <dan.kuebrich@gmail.com 
> <mailto:dan.kuebrich@gmail.com>> wrote:
>
>     There might be some tuning you can do--key cache, etc--though I
>     can't speak to that in your particular case and with 50 column
>     families you'd probably run into pretty bad memory limits.
>
>     However, having found myself in a similar situation in the past,
>     you might consider experimentally trying different batch sizes on
>     the # of rows (eg 1 request for 900 vs 9 for 100 each, etc).  This
>     has helped me solve timeout problems when retrieving "large"
>     numbers of rows in the past and reduced overall retrieval time.  I
>     know that at least the pycassa client supports this type of
>     multiget out of the box.
>
>     On Wed, Aug 31, 2011 at 5:13 AM, Renato Bacelar da Silveira
>     <renatods@indabamobile.co.za <mailto:renatods@indabamobile.co.za>>
>     wrote:
>
>         Hi All
>
>         I am running a query against a node with about 50 Column Families.
>
>         At present One of the column families has 2,502,000 rows, each row
>         contains 100 columns.
>
>         I am searching for 3 columns specifically, and am doing so
>         with Thrift's
>         multiget_slice(). I prepare a statement with about 900 row 
>         keys, each
>         searching for a slice of 3 specific columns.
>
>         My average time taken to return from the multiget_slice() is
>         about 4
>         seconds. I performed a comparative query in mysql, and the results
>         were returned to me in 0.75 seconds or avarage.
>
>         Is 4 seconds way too much time for Cassandra? I am sure this could
>         be under 1 second, like MySql.
>
>         I have resized the Thrift transport size to just 1MB so to not
>         encounter
>         any timeouts, as noted if you push too many queries through.
>         Is this
>         a correct assumption?
>
>         So is it too much to push 900 keys in a multiget_slice() at
>         once? I read
>         that it does a concurrent fetch. I can understand threads
>         racing for
>         cycles, causing waits, but somehow I think I am wrong somewhere.
>
>         Regards to ALL!
>
>
>
>         Renato da Silveira
>         Senior Developer
>         www.indabamobile.co.za <http://www.indabamobile.co.za>
>
>
>
>         -- 
>
>
>


-- 

Mime
View raw message