Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (athena.apache.org: local policy)
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0 (Apple Message framework v1078)
Subject: Re: Are 6..8 seconds to read 23.000 small rows - as it should be?
From: Per Olesen <pol@trifork.com>
In-Reply-To: <AANLkTil_1Fc-mU1V9GqG2yEAJEcuch499VXGi5-HINbC@mail.gmail.com>
Date: Fri, 4 Jun 2010 20:20:04 +0200
Content-Transfer-Encoding: quoted-printable
Message-ID: <D628D88C-21BF-44BE-BDCD-ACA489BC7D9D@trifork.com>
References: <1FCCC842-2F1F-4468-AB93-1A73FE9CD18A@trifork.com>
 <AANLkTincjAAX9zt3_oJIQYN3Yv75LZeyB847tG7O_b0X@mail.gmail.com>
 <B39B34CD-E49B-40DD-B55E-78F9EAC96668@trifork.com>
 <AANLkTil_1Fc-mU1V9GqG2yEAJEcuch499VXGi5-HINbC@mail.gmail.com>
To: "user@cassandra.apache.org" <user@cassandra.apache.org>


On Jun 4, 2010, at 5:19 PM, Ben Browning wrote:

> How many subcolumns are in each supercolumn and how large are the
> values? Your example shows 8 subcolumns, but I didn't know if that was
> the actual number. I've been able to read columns out of Cassandra at
> an order of magnitude higher than what you're seeing here but there
> are too many variables to directly compare.

There are very few columns for each SC. About 8, but it varies a bit. =
The column names and values are pretty small. around 20-30 bytes for =
each column, I guess. So, we are talking small amounts of data here.

Yes, I know there are too many variables, but I have the feeling - as =
you also write - that the performance of this simple thing should be =
orders of magnitude better.=20

So, how might I go about trying to find out why this takes so long time =
in my specific setup? Can I get timings of stuff inside cassandra =
itself?

> Keep in mind that the results from each thrift call has to fit into
> memory - you might be better off paging through the 23000 columns,
> reading a few thousand at a time.

Yes, I know. And I might end up doing this in the end. I do though have =
pretty hard upper limits of how many rows I will end up with for each =
key, but anyways it might be a good idea none the less. Thanks for the =
advice on that one.

Per