cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bill Speirs <bill.spe...@gmail.com>
Subject Re: Super Slow Multi-gets
Date Fri, 11 Feb 2011 12:10:03 GMT
Sorry, I was setting the file on my client not the server. I will make this
change and get back to you.

Thanks again for the help...

Bill-

On Feb 10, 2011 4:45 PM, "Bill Speirs" <bill.speirs@gmail.com> wrote:
> Doesn't seem to help, I just get a bunch of messages that look like this:
>
> DEBUG - Transport open status true for client
CassandraClient<devb01:9160-13>
> DEBUG - Status of releaseClient CassandraClient<unixdevb01:9160-13> to
> queue: true
> DEBUG - Transport open status true for client
CassandraClient<devb01:9160-14>
>
> And I got those before with my other setting...
>
> Bill-
>
> On Thu, Feb 10, 2011 at 4:37 PM, Aaron Morton <aaron@thelastpickle.com>
wrote:
>> Assuming cassandra 0.7 in log4j-server.properties make it look like
this...
>> log4j.rootLogger=DEBUG,stdout,R
>>
>>
>> A
>> On 11 Feb, 2011,at 10:30 AM, Bill Speirs <bill.speirs@gmail.com> wrote:
>>
>> I switched my implementation to use a thread pool of 10 threads each
>> multi-getting 10 keys/rows. This reduces my time from 50s to 5s for
>> fetching all 1,000 messages.
>>
>> I started looking through the Cassandra source to find where the
>> parallel requests are actually made, and I believe it's in
>> org.apache.cassandra.service.StorageProxy.java fetchRows, is this
>> correct? I noticed a number of logger.debug calls, what do I need to
>> set in my log4j.properties file to see these messages as they would
>> probably help me determine what is taking so long. Currently my
>> log4j.properties file looks like this and I'm not seeing these
>> messages:
>>
>> log4j.appender.stdout=org.apache.log4j.ConsoleAppender
>> log4j.appender.stdout.layout=org.apache.log4j.SimpleLayout
>> log4j.category.org.apache=DEBUG, stdout
>> log4j.category.me.prettyprint=DEBUG, stdout
>>
>> Thanks...
>>
>> Bill-
>>
>>
>> On Thu, Feb 10, 2011 at 12:53 PM, Bill Speirs <bill.speirs@gmail.com>
wrote:
>>> Each message row is well under 1K. So I don't think it is network...
plus
>>> all boxes are on a fast LAN.
>>>
>>> Bill-
>>>
>>> On Feb 10, 2011 11:59 AM, "Utku Can Top├žu" <utku@topcu.gen.tr> wrote:
>>>> Dear Bill,
>>>>
>>>> How about the size of the row in the Messages CF. Is it too big? Might
>>>> you
>>>> be having an overhead of the bandwidth?
>>>>
>>>> Regards,
>>>> Utku
>>>>
>>>> On Thu, Feb 10, 2011 at 5:00 PM, Bill Speirs <bill.speirs@gmail.com>
>>>> wrote:
>>>>
>>>>> I have a 7 node setup with a replication factor of 1 and a read
>>>>> consistency of 1 I have two column families: Messages which stores
>>>>> millions of rows with a UUID for the row key, DateIndex which stores
>>>>> thousands of rows with a String as the row key. I perform 2 look-ups
>>>>> for my queries:
>>>>>
>>>>> 1) Fetch the row from DateIndex that includes the date I'm looking
>>>>> for. This returns 1,000 columns where the column names are the UUID of
>>>>> the messages
>>>>> 2) Do a multi-get (Hector client) using those 1,000 row keys I got
>>>>> from the first query.
>>>>>
>>>>> Query 1 is taking ~300ms to fetch 1,000 columns from a single row...
>>>>> respectable. However, query 2 is taking over 50s to perform 1,000 row
>>>>> look-ups! Also, when I scale down to 100 row look-ups for query 2, the
>>>>> time scales in a similar fashion, down to 5s.
>>>>>
>>>>> Am I doing something wrong here? It seems like taking 5s to look-up
>>>>> 100 rows in a distributed hash table is way too slow.
>>>>>
>>>>> Thoughts?
>>>>>
>>>>> Bill-
>>>>>
>>>
>>

Mime
View raw message