From user-return-13191-apmail-cassandra-user-archive=cassandra.apache.org@cassandra.apache.org Thu Feb 10 16:59:26 2011 Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 59853 invoked from network); 10 Feb 2011 16:59:25 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 10 Feb 2011 16:59:25 -0000 Received: (qmail 62119 invoked by uid 500); 10 Feb 2011 16:59:23 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 61677 invoked by uid 500); 10 Feb 2011 16:59:20 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 61669 invoked by uid 99); 10 Feb 2011 16:59:19 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 10 Feb 2011 16:59:19 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [209.85.216.172] (HELO mail-qy0-f172.google.com) (209.85.216.172) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 10 Feb 2011 16:59:13 +0000 Received: by qyk34 with SMTP id 34so2436545qyk.10 for ; Thu, 10 Feb 2011 08:58:52 -0800 (PST) Received: by 10.229.232.15 with SMTP id js15mr14859583qcb.136.1297357131938; Thu, 10 Feb 2011 08:58:51 -0800 (PST) MIME-Version: 1.0 Received: by 10.229.247.207 with HTTP; Thu, 10 Feb 2011 08:58:31 -0800 (PST) In-Reply-To: References: From: =?UTF-8?Q?Utku_Can_Top=C3=A7u?= Date: Thu, 10 Feb 2011 17:58:31 +0100 Message-ID: Subject: Re: Super Slow Multi-gets To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=0016363b90c4fefa5a049bf07ea0 --0016363b90c4fefa5a049bf07ea0 Content-Type: text/plain; charset=UTF-8 Dear Bill, How about the size of the row in the Messages CF. Is it too big? Might you be having an overhead of the bandwidth? Regards, Utku On Thu, Feb 10, 2011 at 5:00 PM, Bill Speirs wrote: > I have a 7 node setup with a replication factor of 1 and a read > consistency of 1. I have two column families: Messages which stores > millions of rows with a UUID for the row key, DateIndex which stores > thousands of rows with a String as the row key. I perform 2 look-ups > for my queries: > > 1) Fetch the row from DateIndex that includes the date I'm looking > for. This returns 1,000 columns where the column names are the UUID of > the messages > 2) Do a multi-get (Hector client) using those 1,000 row keys I got > from the first query. > > Query 1 is taking ~300ms to fetch 1,000 columns from a single row... > respectable. However, query 2 is taking over 50s to perform 1,000 row > look-ups! Also, when I scale down to 100 row look-ups for query 2, the > time scales in a similar fashion, down to 5s. > > Am I doing something wrong here? It seems like taking 5s to look-up > 100 rows in a distributed hash table is way too slow. > > Thoughts? > > Bill- > --0016363b90c4fefa5a049bf07ea0 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Dear Bill,

How about the size of the row in the Messages CF. Is it too big? Might you = be having an overhead of the bandwidth?

Regards,
Utku

On Thu, Feb 10, 2011 at 5:00 PM, Bill Speirs <bill.speirs@gmail.c= om> wrote:
I have a 7 node s= etup with a replication factor of 1 and a read
consistency of 1. I have two column families: Messages which stores
millions of rows with a UUID for the row key, DateIndex which stores
thousands of rows with a String as the row key. I perform 2 look-ups
for my queries:

1) Fetch the row from DateIndex that includes the date I'm looking
for. This returns 1,000 columns where the column names are the UUID of
the messages
2) Do a multi-get (Hector client) using those 1,000 row keys I got
from the first query.

Query 1 is taking ~300ms to fetch 1,000 columns from a single row...
respectable. However, query 2 is taking over 50s to perform 1,000 row
look-ups! Also, when I scale down to 100 row look-ups for query 2, the
time scales in a similar fashion, down to 5s.

Am I doing something wrong here? It seems like taking 5s to look-up
100 rows in a distributed hash table is way too slow.

Thoughts?

Bill-

--0016363b90c4fefa5a049bf07ea0--