incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bill Speirs <bill.spe...@gmail.com>
Subject Re: Super Slow Multi-gets
Date Thu, 10 Feb 2011 21:45:29 GMT
Doesn't seem to help, I just get a bunch of messages that look like this:

DEBUG - Transport open status true for client CassandraClient<devb01:9160-13>
DEBUG - Status of releaseClient CassandraClient<unixdevb01:9160-13> to
queue: true
DEBUG - Transport open status true for client CassandraClient<devb01:9160-14>

And I got those before with my other setting...

Bill-

On Thu, Feb 10, 2011 at 4:37 PM, Aaron Morton <aaron@thelastpickle.com> wrote:
> Assuming cassandra 0.7 in log4j-server.properties make it look like this...
> log4j.rootLogger=DEBUG,stdout,R
>
>
> A
> On 11 Feb, 2011,at 10:30 AM, Bill Speirs <bill.speirs@gmail.com> wrote:
>
> I switched my implementation to use a thread pool of 10 threads each
> multi-getting 10 keys/rows. This reduces my time from 50s to 5s for
> fetching all 1,000 messages.
>
> I started looking through the Cassandra source to find where the
> parallel requests are actually made, and I believe it's in
> org.apache.cassandra.service.StorageProxy.java fetchRows, is this
> correct? I noticed a number of logger.debug calls, what do I need to
> set in my log4j.properties file to see these messages as they would
> probably help me determine what is taking so long. Currently my
> log4j.properties file looks like this and I'm not seeing these
> messages:
>
> log4j.appender.stdout=org.apache.log4j.ConsoleAppender
> log4j.appender.stdout.layout=org.apache.log4j.SimpleLayout
> log4j.category.org.apache=DEBUG, stdout
> log4j.category.me.prettyprint=DEBUG, stdout
>
> Thanks...
>
> Bill-
>
>
> On Thu, Feb 10, 2011 at 12:53 PM, Bill Speirs <bill.speirs@gmail.com> wrote:
>> Each message row is well under 1K. So I don't think it is network... plus
>> all boxes are on a fast LAN.
>>
>> Bill-
>>
>> On Feb 10, 2011 11:59 AM, "Utku Can Top├žu" <utku@topcu.gen.tr> wrote:
>>> Dear Bill,
>>>
>>> How about the size of the row in the Messages CF. Is it too big? Might
>>> you
>>> be having an overhead of the bandwidth?
>>>
>>> Regards,
>>> Utku
>>>
>>> On Thu, Feb 10, 2011 at 5:00 PM, Bill Speirs <bill.speirs@gmail.com>
>>> wrote:
>>>
>>>> I have a 7 node setup with a replication factor of 1 and a read
>>>> consistency of 1 I have two column families: Messages which stores
>>>> millions of rows with a UUID for the row key, DateIndex which stores
>>>> thousands of rows with a String as the row key. I perform 2 look-ups
>>>> for my queries:
>>>>
>>>> 1) Fetch the row from DateIndex that includes the date I'm looking
>>>> for. This returns 1,000 columns where the column names are the UUID of
>>>> the messages
>>>> 2) Do a multi-get (Hector client) using those 1,000 row keys I got
>>>> from the first query.
>>>>
>>>> Query 1 is taking ~300ms to fetch 1,000 columns from a single row...
>>>> respectable. However, query 2 is taking over 50s to perform 1,000 row
>>>> look-ups! Also, when I scale down to 100 row look-ups for query 2, the
>>>> time scales in a similar fashion, down to 5s.
>>>>
>>>> Am I doing something wrong here? It seems like taking 5s to look-up
>>>> 100 rows in a distributed hash table is way too slow.
>>>>
>>>> Thoughts?
>>>>
>>>> Bill-
>>>>
>>
>

Mime
View raw message