incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Utku Can Topçu <u...@topcu.gen.tr>
Subject Re: Super Slow Multi-gets
Date Fri, 11 Feb 2011 07:53:06 GMT
Bill,
It still sounds really strange.

Can you reproduce it? And note down the steps; I'm sure people here would be
pleased to repeat it.

Regards,
Utku

On Fri, Feb 11, 2011 at 5:34 AM, Mark Guzman <segfault@hasno.info> wrote:

> I assume this should be set on all of the servers? Is there anything in
> particular one would look for in the log results?
>
> On Feb 10, 2011, at 4:37 PM, Aaron Morton wrote:
>
> Assuming cassandra 0.7 in log4j-server.properties make it look like this...
>
> log4j.rootLogger=DEBUG,stdout,R
>
>
> A
> On 11 Feb, 2011,at 10:30 AM, Bill Speirs <bill.speirs@gmail.com> wrote:
>
> I switched my implementation to use a thread pool of 10 threads each
> multi-getting 10 keys/rows. This reduces my time from 50s to 5s for
> fetching all 1,000 messages.
>
> I started looking through the Cassandra source to find where the
> parallel requests are actually made, and I believe it's in
> org.apache.cassandra.service.StorageProxy.java fetchRows, is this
> correct? I noticed a number of logger.debug calls, what do I need to
> set in my log4j.properties file to see these messages as they would
> probably help me determine what is taking so long. Currently my
> log4j.properties file looks like this and I'm not seeing these
> messages:
>
> log4j.appender.stdout=org.apache.log4j.ConsoleAppender
> log4j.appender.stdout.layout=org.apache.log4j.SimpleLayout
> log4j.category.org.apache=DEBUG, stdout
> log4j.category.me.prettyprint=DEBUG, stdout
>
> Thanks...
>
> Bill-
>
>
> On Thu, Feb 10, 2011 at 12:53 PM, Bill Speirs <bill.speirs@gmail.com>
> wrote:
> > Each message row is well under 1K. So I don't think it is network... plus
> > all boxes are on a fast LAN.
> >
> > Bill-
> >
> > On Feb 10, 2011 11:59 AM, "Utku Can Topçu" <utku@topcu.gen.tr> wrote:
> >> Dear Bill,
> >>
> >> How about the size of the row in the Messages CF. Is it too big? Might
> you
> >> be having an overhead of the bandwidth?
> >>
> >> Regards,
> >> Utku
> >>
> >> On Thu, Feb 10, 2011 at 5:00 PM, Bill Speirs <bill.speirs@gmail.com>
> >> wrote:
> >>
> >>> I have a 7 node setup with a replication factor of 1 and a read
> >>> consistency of 1 I have two column families: Messages which stores
> >>> millions of rows with a UUID for the row key, DateIndex which stores
> >>> thousands of rows with a String as the row key. I perform 2 look-ups
> >>> for my queries:
> >>>
> >>> 1) Fetch the row from DateIndex that includes the date I'm looking
> >>> for. This returns 1,000 columns where the column names are the UUID of
> >>> the messages
> >>> 2) Do a multi-get (Hector client) using those 1,000 row keys I got
> >>> from the first query.
> >>>
> >>> Query 1 is taking ~300ms to fetch 1,000 columns from a single row...
> >>> respectable. However, query 2 is taking over 50s to perform 1,000 row
> >>> look-ups! Also, when I scale down to 100 row look-ups for query 2, the
> >>> time scales in a similar fashion, down to 5s.
> >>>
> >>> Am I doing something wrong here? It seems like taking 5s to look-up
> >>> 100 rows in a distributed hash table is way too slow.
> >>>
> >>> Thoughts?
> >>>
> >>> Bill-
> >>>
> >
>
>
>

Mime
View raw message