Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id F2FA6F298 for ; Thu, 21 Mar 2013 17:41:43 +0000 (UTC) Received: (qmail 58726 invoked by uid 500); 21 Mar 2013 17:41:41 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 58701 invoked by uid 500); 21 Mar 2013 17:41:41 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 58693 invoked by uid 99); 21 Mar 2013 17:41:40 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 21 Mar 2013 17:41:40 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [208.113.200.5] (HELO homiemail-a45.g.dreamhost.com) (208.113.200.5) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 21 Mar 2013 17:41:36 +0000 Received: from homiemail-a45.g.dreamhost.com (localhost [127.0.0.1]) by homiemail-a45.g.dreamhost.com (Postfix) with ESMTP id 10F25480C3 for ; Thu, 21 Mar 2013 10:41:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=thelastpickle.com; h=from :content-type:message-id:mime-version:subject:date:references:to :in-reply-to; s=thelastpickle.com; bh=idwNpeeo2zD5UMCtmRPqmvsMTm o=; b=VWNUqsyPcqOCuKo3R2g1qrtCAFPyo/IMfTANkWpLZDt+BrJU85Nb+M4EF+ kSXKuphdT+HIPSAxLqSDi8c8NAzs9NiKL94F6Ysjl1UNXMuLbB5xBUe7SRc1NtjF V0wkFW1ida60f3ggZIzo1Lxh3Chve7rpxHoZsIMoYbXw6Jzdk= Received: from [172.16.1.8] (unknown [203.86.207.101]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: aaron@thelastpickle.com) by homiemail-a45.g.dreamhost.com (Postfix) with ESMTPSA id EBA22480A5 for ; Thu, 21 Mar 2013 10:41:14 -0700 (PDT) From: aaron morton Content-Type: multipart/alternative; boundary="Apple-Mail=_EFF492AC-E7C3-48B4-B2B8-3B6647A89019" Message-Id: <4AC095AB-8DF4-4360-997A-372AF5BF5285@thelastpickle.com> Mime-Version: 1.0 (Mac OS X Mail 6.2 \(1499\)) Subject: Re: Unable to fetch large amount of rows Date: Fri, 22 Mar 2013 06:41:13 +1300 References: <37543DA9F9D949E3A06B7259AF33A262@pune.wibhu.com> To: user@cassandra.apache.org In-Reply-To: <37543DA9F9D949E3A06B7259AF33A262@pune.wibhu.com> X-Mailer: Apple Mail (2.1499) X-Virus-Checked: Checked by ClamAV on apache.org --Apple-Mail=_EFF492AC-E7C3-48B4-B2B8-3B6647A89019 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=us-ascii > + Did run cfhistograms, the results are interesting (Note: row cache = is > disabled): SSTables in cfhistograms is a friend here. It tells you how many = sstables were read from per read, if it's above 3 I then take a look at = the data model. If you case I would be wondering how long that row with = the time stamp is written to. Is it spread over many sstables ?=20 > + 75% time is spent on disk latency Do you mean 75% of the latency reported by proxyhistorgrams is also = reported by cfhistograms > +++ When query made on node on which all the records are not present Do you mean the co-ordinator for the request was not a replica for the = row? > + If my query is=20 >=20 > - select * from schema where timestamp =3D '..' ORDER BY = MacAddress, > would that be faster than, say >=20 > - select * from schema where timestamp =3D '..'=20 As usual in a DB, it's faster to not re-order things. I'd have to check = if the order by will no-op if it's the same as the clustering columns, = for now lets just keep it out.=20 >=20 > 2) Why does response time suffer when query is made on a node on which > records to be returned are not present? In order to be able to get = better > response when queried from a different node, can something be done? During a read one node is asked to return the data, and the others to = return a digest of their data. When the read runs on a node that is a = replica the data read is done locally and the others are asked for a = digest, this can lead to better performance. If you are asking for a = large row this will have a larger impact.=20 Astyanax can direct reads to nodes which are replicas.=20 Cheers ----------------- Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 21/03/2013, at 4:48 PM, Pushkar Prasad = wrote: > Yes, I'm reading from a single partition. >=20 > -----Original Message----- > From: Hiller, Dean [mailto:Dean.Hiller@nrel.gov]=20 > Sent: 21 March 2013 01:38 > To: user@cassandra.apache.org > Subject: Re: Unable to fetch large amount of rows >=20 > Is your use case reading from a single partition? If so, you may want = to > switch to something like playorm which does virtual partitions so you = still > get the performance of multiple disks when reading from a single = partition. > My understanding is a single cassandra partition exists on a single = node. > Anyways, just an option if that is your use-case. >=20 > Later, > Dean >=20 > From: Pushkar Prasad > = net>> > Reply-To: = "user@cassandra.apache.org" > > > Date: Wednesday, March 20, 2013 11:41 AM > To: "user@cassandra.apache.org" > > > Subject: RE: Unable to fetch large amount of rows >=20 > Hi aaron. >=20 > I added pagination, and things seem to have started performing much = better. > With 1000 page size, now able to fetch 500K records in 25-30 seconds. > However, I'd like to point you to some interesting observations: >=20 > + Did run cfhistograms, the results are interesting (Note: row cache = is > disabled): > +++ When query made on node on which all the records are present > + 75% time is spent on disk latency > + Example: When 50 K entries were fetched, it took 2.65 = seconds, out > of which 1.92 seconds were spent in disk latency > +++ When query made on node on which all the records are not present > + Considerable amount of time is spent on things other than = disk > latency (probably deserialization/serialization, network, etc.) > + Example: When 50 K entries were fetched, it took 5.74 = seconds, out > of which 2.21 seconds were spent in disk latency. >=20 > I've used Astyanax to run the above queries. The results were same = when run > with different data points. Compaction has not been done after data > population yet. >=20 > I've a few questions: > 1) Is it necessary to fetch the records in natural order of comparator > column in order to get a high throughput? I'm trying to fetch all the > records for a particular partition ID without any ordering on = comparator > column. Would that slow down the response? Consider that timestamp is > partitionId, and MacAddress is natural comparator column. > + If my query is > - select * from schema where timestamp =3D '..' ORDER BY = MacAddress, > would that be faster than, say > - select * from schema where timestamp =3D '..' > 2) Why does response time suffer when query is made on a node on which > records to be returned are not present? In order to be able to get = better > response when queried from a different node, can something be done? >=20 > Thanks > Pushkar > ________________________________ > From: aaron morton [mailto:aaron@thelastpickle.com] > Sent: 20 March 2013 15:02 > To: user@cassandra.apache.org > Subject: Re: Unable to fetch large amount of rows >=20 > The query returns fine if I request for lesser number of entries = (takes 15 > seconds for returning 20K records). > That feels a little slow, but it depends on the data model, the query = type > and the server and a bunch of other things. >=20 > However, as I increase the limit on > number of entries, the response begins to slow down. It results in > TimedOutException. > Make many smaller requests. > This is often faster. >=20 > Isn't it the case that all the data for a partitionID is stored = sequentially > in disk? > Yes and no. > In each file all the columns on one partition / row are stored in = comparator > order. But there may be many files. >=20 > If that is so, then why does fetching this data take such a long > amount of time? > You need to work out where the time is being spent. > Add timing to your app, use nodetool proxyhistograms to see how long = the > requests takes at the co-ordinator, use nodetool histograms to see how = long > it takes at the disk level. >=20 > Look at your data model, are you reading data in the natural order of = the > comparator. >=20 > If disk throughput is 40 MB/s, then assuming sequential > reads, the response should come pretty quickly. > There is more involved than doing one read from disk and returning it. >=20 > If it is stored > sequentially, why does C* take so much time to return the records? > It is always going to take time to read 500,000 columns. It will take = time > on the client to allocate the 2 to 4 million objects needed to = represent > them. And once it comes to allocating those objects it will probably = take > more than 40MB in ram. >=20 > Do some tests at a smaller scale, start with 500 or 1000 columns then = get > bigger, to get a feel for what is practical in your environment. Often = it's > better to make many smaller / constant size requests. >=20 > Cheers >=20 > ----------------- > Aaron Morton > Freelance Cassandra Consultant > New Zealand >=20 > @aaronmorton > http://www.thelastpickle.com >=20 > On 19/03/2013, at 9:38 PM, Pushkar Prasad > = net>> wrote: >=20 >=20 > Aaron, >=20 > Thanks for your reply. Here are the answers to questions you had = asked: >=20 > I am trying to read all the rows which have a particular TimeStamp. In = my > data base, there are 500 K entries for a particular TimeStamp. That = means > about 40 MB of data. >=20 > The query returns fine if I request for lesser number of entries = (takes 15 > seconds for returning 20K records). However, as I increase the limit = on > number of entries, the response begins to slow down. It results in > TimedOutException. >=20 > Isn't it the case that all the data for a partitionID is stored = sequentially > in disk? If that is so, then why does fetching this data take such a = long > amount of time? If disk throughput is 40 MB/s, then assuming = sequential > reads, the response should come pretty quickly. Is it not the case = that the > data I am trying to fetch would be sequentially stored? If it is = stored > sequentially, why does C* take so much time to return the records? And = if > data is stored sequentially, is there any alternative that would allow = me to > fetch all the records quickly (by sequential disk fetch)? >=20 > Thanks > Pushkar >=20 > -----Original Message----- > From: aaron morton > [mailto:aaron@thelastpickle.com] > Sent: 19 March 2013 13:11 > To: user@cassandra.apache.org > Subject: Re: Unable to fetch large amount of rows >=20 >=20 > I have 1000 timestamps, and for each timestamp, I have 500K different > MACAddress. > So you are trying to read about 2 million columns ? > 500K MACAddresses each with 3 other columns? >=20 >=20 > When I run the following query, I get RPC Timeout exceptions: > What is the exception? > Is it a client side socket timeout or a server side TimedOutException = ? >=20 > If my understanding is correct then try reading fewer columns and/or = check > the server side for logs. It sounds like you are trying to read too = much > though. >=20 > Cheers >=20 >=20 >=20 > ----------------- > Aaron Morton > Freelance Cassandra Consultant > New Zealand >=20 > @aaronmorton > http://www.thelastpickle.com >=20 > On 19/03/2013, at 3:51 AM, Pushkar Prasad > = net>> wrote: >=20 >=20 > Hi, >=20 > I have following schema: >=20 > TimeStamp > MACAddress > Data Transfer > Data Rate > LocationID >=20 > PKEY is (TimeStamp, MACAddress). That means partitioning is on = TimeStamp, > and data is ordered by MACAddress, and stored together physically (let = me > know if my understanding is wrong). I have 1000 timestamps, and for = each > timestamp, I have 500K different MACAddress. >=20 >=20 > When I run the following query, I get RPC Timeout exceptions: >=20 >=20 > Select * from db_table where Timestamp=3D'...' >=20 > =46rom my understanding, this should give all the rows with just one = disk > seek, as all the records for a particular timeStamp. This should be = very > quick, however, clearly, that doesn't seem to be the case. Is there > something I am missing here? Your help would be greatly appreciated. >=20 >=20 > Thanks > PP >=20 >=20 >=20 >=20 >=20 --Apple-Mail=_EFF492AC-E7C3-48B4-B2B8-3B6647A89019 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=us-ascii + Did run cfhistograms, the results are = interesting (Note: row cache is
disabled):
SSTables = in cfhistograms is a friend here. It tells you how many sstables were = read from per read, if it's above 3 I then take a look at the data = model. If you case I would be wondering how long that row with the time = stamp is written to. Is it spread over many sstables = ? 

    =    + 75% time is spent on disk latency
Do you =  mean 75% of the latency reported by proxyhistorgrams is also = reported by cfhistograms

+++ When query made on node on which all the records are = not present
Do you mean the co-ordinator for the request = was not a replica for the row?

   + If my query = is 

       - =   select * from schema where timestamp =3D '..' ORDER BY = MacAddress,
would that be faster than, = say

       - =   select * from schema where timestamp =3D = '..' 
As usual in a DB, it's faster to = not re-order things. I'd have to check if the order by will no-op if = it's the same as the clustering columns, for now lets just keep it = out. 


2) Why does response time suffer when = query is made on a node on which
records to be returned are not = present? In order to be able to get better
response when queried from = a different node, can something be = done?
During a read one node is = asked to return the data, and the others to return a digest of their = data. When the read runs on a node that is a replica the data read is = done locally and the others are asked for a digest, this can lead to = better performance. If you are asking for a large row this will have a = larger impact. 

Astyanax can direct = reads to nodes which are = replicas. 

Cheers

<= br>
http://www.thelastpickle.com

On 21/03/2013, at 4:48 PM, Pushkar Prasad <pushkar.prasad@airtigh= tnetworks.net> wrote:

Yes, I'm = reading from a single partition.

-----Original = Message-----
From: Hiller, Dean [mailto:Dean.Hiller@nrel.gov]
Sent: 21 March 2013 = 01:38
To: user@cassandra.apache.orgSubject: Re: Unable to fetch large amount of rows

Is your use = case reading from a single partition?  If so, you may want = to
switch to something like playorm which does virtual partitions so = you still
get the performance of multiple disks when reading from a = single partition.
My understanding is a single cassandra partition = exists on a single node.
Anyways, just an option if that is your = use-case.

Later,
Dean

From: Pushkar Prasad
<pushkar.prasad@airtigh= tnetworks.net<mailto:pushkar.prasad@airtightnetworks.
net>>= ;
Reply-To: "user@cassandra.apache.org<= ;mailto:user@cassandra.apache.org= >"
<user@cassandra.apache.org<= ;mailto:user@cassandra.apache.org= >>
Date: Wednesday, March 20, 2013 11:41 AM
To: "user@cassandra.apache.org<= ;mailto:user@cassandra.apache.org= >"
<user@cassandra.apache.org<= ;mailto:user@cassandra.apache.org= >>
Subject: RE: Unable to fetch large amount of = rows

Hi aaron.

I added pagination, and things seem to have = started performing much better.
With 1000 page size, now able to = fetch 500K records in 25-30 seconds.
However, I'd like to point you = to some interesting observations:

+ Did run cfhistograms, the = results are interesting (Note: row cache is
disabled):
+++ When = query made on node on which all the records are present
=        + 75% time is spent on disk = latency
       + Example: When 50 = K entries were fetched, it took 2.65 seconds, out
of which 1.92 = seconds were spent in disk latency
+++ When query made on node on = which all the records are not present
=        + Considerable amount of time = is spent on things other than disk
latency (probably = deserialization/serialization, network, etc.)
=        + Example: When 50 K entries = were fetched, it took 5.74 seconds, out
of which 2.21 seconds were = spent in disk latency.

I've used Astyanax to run the above = queries. The results were same when run
with different data points. = Compaction has not been done after data
population yet.

I've a = few questions:
1) Is it necessary to fetch the records in natural = order of comparator
column in order to get a high throughput? I'm = trying to fetch all the
records for a particular partition ID without = any ordering on comparator
column. Would that slow down the response? = Consider that timestamp is
partitionId, and MacAddress is natural = comparator column.
   + If my query is
=        -   select * from = schema where timestamp =3D '..' ORDER BY MacAddress,
would that be = faster than, say
       - =   select * from schema where timestamp =3D '..'
2) Why does = response time suffer when query is made on a node on which
records to = be returned are not present? In order to be able to get = better
response when queried from a different node, can something be = done?

Thanks
Pushkar
________________________________
From= : aaron morton [mailto:aaron@thelastpickle.com]
Sent: 20 = March 2013 15:02
To: user@cassandra.apache.org<= ;mailto:user@cassandra.apache.org= >
Subject: Re: Unable to fetch large amount of rows

The = query returns fine if I request for lesser number of entries (takes = 15
seconds for returning 20K records).
That feels a little slow, = but it depends on the data model, the query type
and the server and a = bunch of other things.

However, as I increase the limit = on
number of entries, the response begins to slow down. It results = in
TimedOutException.
Make many smaller requests.
This is often = faster.

Isn't it the case that all the data for a partitionID is = stored sequentially
in disk?
Yes and no.
In each file all the = columns on one partition / row are stored in comparator
order. But = there may be many files.

If that is so, then why does fetching = this data take such a long
amount of time?
You need to work out = where the time is being spent.
Add timing to your app, use nodetool = proxyhistograms to see how long the
requests takes at the = co-ordinator, use nodetool histograms to see how long
it takes at the = disk level.

Look at your data model, are you reading data in the = natural order of the
comparator.

If disk throughput is 40 = MB/s, then assuming sequential
reads, the response should come pretty = quickly.
There is more involved than doing one read from disk and = returning it.

If it is stored
sequentially, why does C* take = so much time to return the records?
It is always going to take time = to read 500,000 columns. It will take time
on the client to allocate = the 2 to 4 million objects needed to represent
them. And once it = comes to allocating those objects it will probably take
more than = 40MB in ram.

Do some tests at a smaller scale, start with 500 or = 1000 columns then get
bigger, to get a feel for what is practical in = your environment. Often it's
better to make many smaller / constant = size requests.

Cheers

-----------------
Aaron = Morton
Freelance Cassandra Consultant
New = Zealand

@aaronmorton
http://www.thelastpickle.com
=
On 19/03/2013, at 9:38 PM, Pushkar = Prasad
<pushkar.prasad@airtightnetworks.net<mailto:pushkar.prasad= @airtightnetworks.
net>> wrote:


Aaron,

Thanks = for your reply. Here are the answers to questions you had = asked:

I am trying to read all the rows which have a particular = TimeStamp. In my
data base, there are 500 K entries for a particular = TimeStamp. That means
about 40 MB of data.

The query returns = fine if I request for lesser number of entries (takes 15
seconds for = returning 20K records). However, as I increase the limit on
number of = entries, the response begins to slow down. It results = in
TimedOutException.

Isn't it the case that all the data for = a partitionID is stored sequentially
in disk? If that is so, then why = does fetching this data take such a long
amount of time? If disk = throughput is 40 MB/s, then assuming sequential
reads, the response = should come pretty quickly. Is it not the case that the
data I am = trying to fetch would be sequentially stored? If it is = stored
sequentially, why does C* take so much time to return the = records? And if
data is stored sequentially, is there any alternative = that would allow me to
fetch all the records quickly (by sequential = disk fetch)?

Thanks
Pushkar

-----Original = Message-----
From: aaron = morton
[mailto:aaron@thelastpickle.com<http://thelastpickle.com>]=
Sent: 19 March 2013 13:11
To: = user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subje= ct: Re: Unable to fetch large amount of rows


I have 1000 = timestamps, and for each timestamp, I have 500K = different
MACAddress.
So you are trying to read about 2 million = columns ?
500K MACAddresses each with 3 other = columns?


When I run the following query, I get RPC Timeout = exceptions:
What is the exception?
Is it a client side socket = timeout or a server side TimedOutException ?

If my understanding = is correct then try reading fewer columns and/or check
the server = side for logs. It sounds like you are trying to read too = much
though.

Cheers



-----------------
Aaron = Morton
Freelance Cassandra Consultant
New = Zealand

@aaronmorton
http://www.thelastpickle.com

On = 19/03/2013, at 3:51 AM, Pushkar = Prasad
<pushkar.prasad@airtightnetworks.net<mailto:pushkar.prasad= @airtightnetworks.
net>> wrote:


Hi,

I have = following schema:

TimeStamp
MACAddress
Data = Transfer
Data Rate
LocationID

PKEY is (TimeStamp, = MACAddress). That means partitioning is on TimeStamp,
and data is = ordered by MACAddress, and stored together physically (let me
know if = my understanding is wrong). I have 1000 timestamps, and for = each
timestamp, I have 500K different MACAddress.


When I = run the following query, I get RPC Timeout exceptions:


Select = * from db_table where Timestamp=3D'...'

=46rom my understanding, = this should give all the rows with just one disk
seek, as all the = records for a particular timeStamp. This should be very
quick, = however, clearly, that doesn't seem to be the case. Is = there
something I am missing here? Your help would be greatly = appreciated.


Thanks
PP





<= /div>
= --Apple-Mail=_EFF492AC-E7C3-48B4-B2B8-3B6647A89019--