incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Pushkar Prasad" <pushkar.pra...@airtightnetworks.net>
Subject RE: Unable to fetch large amount of rows
Date Tue, 19 Mar 2013 08:38:15 GMT
Aaron,

Thanks for your reply. Here are the answers to questions you had asked:

I am trying to read all the rows which have a particular TimeStamp. In my
data base, there are 500 K entries for a particular TimeStamp. That means
about 40 MB of data.

The query returns fine if I request for lesser number of entries (takes 15
seconds for returning 20K records). However, as I increase the limit on
number of entries, the response begins to slow down. It results in
TimedOutException.

Isn't it the case that all the data for a partitionID is stored sequentially
in disk? If that is so, then why does fetching this data take such a long
amount of time? If disk throughput is 40 MB/s, then assuming sequential
reads, the response should come pretty quickly. Is it not the case that the
data I am trying to fetch would be sequentially stored? If it is stored
sequentially, why does C* take so much time to return the records? And if
data is stored sequentially, is there any alternative that would allow me to
fetch all the records quickly (by sequential disk fetch)?

Thanks
Pushkar

-----Original Message-----
From: aaron morton [mailto:aaron@thelastpickle.com] 
Sent: 19 March 2013 13:11
To: user@cassandra.apache.org
Subject: Re: Unable to fetch large amount of rows

>  I have 1000 timestamps, and for each timestamp, I have 500K different
MACAddress.
So you are trying to read about 2 million columns ? 
500K MACAddresses each with 3 other columns? 

> When I run the following query, I get RPC Timeout exceptions:
What is the exception? 
Is it a client side socket timeout or a server side TimedOutException ?

If my understanding is correct then try reading fewer columns and/or check
the server side for logs. It sounds like you are trying to read too much
though. 

Cheers



-----------------
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 19/03/2013, at 3:51 AM, Pushkar Prasad
<pushkar.prasad@airtightnetworks.net> wrote:

> Hi,
>  
> I have following schema:
>  
> TimeStamp
> MACAddress
> Data Transfer
> Data Rate
> LocationID
>  
> PKEY is (TimeStamp, MACAddress). That means partitioning is on TimeStamp,
and data is ordered by MACAddress, and stored together physically (let me
know if my understanding is wrong). I have 1000 timestamps, and for each
timestamp, I have 500K different MACAddress.
>  
> When I run the following query, I get RPC Timeout exceptions:
>  
>  
> Select * from db_table where Timestamp='...'
>  
> From my understanding, this should give all the rows with just one disk
seek, as all the records for a particular timeStamp. This should be very
quick, however, clearly, that doesn't seem to be the case. Is there
something I am missing here? Your help would be greatly appreciated.
>  
> Thanks
> PP




Mime
View raw message