Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 88106 invoked from network); 10 Feb 2011 17:54:07 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 10 Feb 2011 17:54:07 -0000 Received: (qmail 50926 invoked by uid 500); 10 Feb 2011 17:54:05 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 50547 invoked by uid 500); 10 Feb 2011 17:54:02 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 50511 invoked by uid 99); 10 Feb 2011 17:54:01 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 10 Feb 2011 17:54:01 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of bill.speirs@gmail.com designates 209.85.218.44 as permitted sender) Received: from [209.85.218.44] (HELO mail-yi0-f44.google.com) (209.85.218.44) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 10 Feb 2011 17:53:56 +0000 Received: by yie19 with SMTP id 19so789834yie.31 for ; Thu, 10 Feb 2011 09:53:35 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=xnw4bJ1RTrY5wKhZhF4ba+o/IvURAzntQs79Ro/cHBk=; b=wB+Z+z6DruIfrFQqs0t0f7tIhwpQD1OCXsGFoZszSdrD4sPQ7Tjq7cPfopSJTlCiEP OTeW/qhHGQkTJTP2EMsFQjLHf25g4+4qkAUEEzUHstHsOyBn4mN2E9oZX61zshqzUrP9 H1hiGwTnVwh/bimeRiV7MPbBDOG4M17+XTGgw= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=M1sMWY8yKLtexigQNWr+KJa6VMGji571L1yBDqz+w/Q3KRPqs++MQoeJW52330ZCIz iOaKsgRLNRpRrbNWu/A9DMdoEZtu3ITA7fcExxC2bjVuvIlAg2DOW4BE5+uPiSSUlV9o TPGySREtHIDv7Ailu941GZhv7pLL3nN01y5g0= MIME-Version: 1.0 Received: by 10.100.229.17 with SMTP id b17mr7690006anh.80.1297360415462; Thu, 10 Feb 2011 09:53:35 -0800 (PST) Received: by 10.100.46.9 with HTTP; Thu, 10 Feb 2011 09:53:35 -0800 (PST) Received: by 10.100.46.9 with HTTP; Thu, 10 Feb 2011 09:53:35 -0800 (PST) In-Reply-To: References: Date: Thu, 10 Feb 2011 12:53:35 -0500 Message-ID: Subject: Re: Super Slow Multi-gets From: Bill Speirs To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=00163691ff35b5984b049bf1420e --00163691ff35b5984b049bf1420e Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Each message row is well under 1K. So I don't think it is network... plus all boxes are on a fast LAN. Bill- On Feb 10, 2011 11:59 AM, "Utku Can Top=E7u" wrote: > Dear Bill, > > How about the size of the row in the Messages CF. Is it too big? Might yo= u > be having an overhead of the bandwidth? > > Regards, > Utku > > On Thu, Feb 10, 2011 at 5:00 PM, Bill Speirs wrote: > >> I have a 7 node setup with a replication factor of 1 and a read >> consistency of 1. I have two column families: Messages which stores >> millions of rows with a UUID for the row key, DateIndex which stores >> thousands of rows with a String as the row key. I perform 2 look-ups >> for my queries: >> >> 1) Fetch the row from DateIndex that includes the date I'm looking >> for. This returns 1,000 columns where the column names are the UUID of >> the messages >> 2) Do a multi-get (Hector client) using those 1,000 row keys I got >> from the first query. >> >> Query 1 is taking ~300ms to fetch 1,000 columns from a single row... >> respectable. However, query 2 is taking over 50s to perform 1,000 row >> look-ups! Also, when I scale down to 100 row look-ups for query 2, the >> time scales in a similar fashion, down to 5s. >> >> Am I doing something wrong here? It seems like taking 5s to look-up >> 100 rows in a distributed hash table is way too slow. >> >> Thoughts? >> >> Bill- >> --00163691ff35b5984b049bf1420e Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable

Each message row is well under 1K. So I don't think it is network...= plus all boxes are on a fast LAN.

Bill-

On Feb 10, 2011 11:59 AM, "Utku Can Top=E7u" <utku@topcu.gen.tr> wrote:
> Dear Bill,
>
> How about the size of the row in the= Messages CF. Is it too big? Might you
> be having an overhead of the bandwidth?
>
> Regards,
&= gt; Utku
>
> On Thu, Feb 10, 2011 at 5:00 PM, Bill Speirs <= bill.speirs@gmail.com> wrot= e:
>
>> I have a 7 node setup with a replication factor of 1 and = a read
>> consistency of 1. I have two column families: Messages w= hich stores
>> millions of rows with a UUID for the row key, DateI= ndex which stores
>> thousands of rows with a String as the row key. I perform 2 look-u= ps
>> for my queries:
>>
>> 1) Fetch the row fro= m DateIndex that includes the date I'm looking
>> for. This re= turns 1,000 columns where the column names are the UUID of
>> the messages
>> 2) Do a multi-get (Hector client) using t= hose 1,000 row keys I got
>> from the first query.
>>
= >> Query 1 is taking ~300ms to fetch 1,000 columns from a single row.= ..
>> respectable. However, query 2 is taking over 50s to perform 1,000 = row
>> look-ups! Also, when I scale down to 100 row look-ups for q= uery 2, the
>> time scales in a similar fashion, down to 5s.
>>
>> Am I doing something wrong here? It seems like taking = 5s to look-up
>> 100 rows in a distributed hash table is way too s= low.
>>
>> Thoughts?
>>
>> Bill-
>>

--00163691ff35b5984b049bf1420e--