incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <moshe.kr...@barclays.com>
Subject RE: Secondary Index on table with a lot of data crashes Cassandra
Date Thu, 25 Apr 2013 08:32:10 GMT
IMHO: user_name is not a column, it is the row key. Therefore, according to http://thelastpickle.com/2011/07/04/Cassandra-Query-Plans/
, the row does not contain a relevant column index, which causes the iterator to read each
column (including value) of each row.

I believe that instead of referring to user_name as if it were a column, you need to refer
to it via the reserved word "KEY", e.g.:


Select KEY from users where status = 2;

Always glad to share a theory with a friend....


From: Tamar Rosen [mailto:tamar@correlor.com]
Sent: Thursday, April 25, 2013 11:04 AM
To: user@cassandra.apache.org
Subject: Secondary Index on table with a lot of data crashes Cassandra

Hi,

We have a case of a reproducible crash, probably due to out of memory, but I don't understand
why.

The installation is currently single node.

We have a column family with approx 50000 rows.

In cql, the CF definition is:




CREATE TABLE users (

  user_name text PRIMARY KEY,

  big_json text,

  status int

);



Each big_json can have 500K or more of data.



There is also a secondary index on the status column.

Status can have various values, over 90% of all rows have status = 2.





Calling:



Select user_name from users limit 80000;



Is pretty fast







Calling:



Select user_name from users where status = 1;

is slower, even though much less data is returned.



Calling:



Select user_name from users where status = 2;



Always crashes.





What are we doing wrong? Can it be that Cassandra is actually trying to read all the CF data
rather than just the keys! (actually, it doesn't need to go to the users CF at all - all the
data it needs is in the index CF)





Also, in the code I am doing the same using Astyanax index query with pagination, and the
behavior is the same.



Please help me:



1. solve the immediate issue



2. understand if there is something in this use case which indicates that we are not using
Cassandra the way it is meant.





Thanks,





Tamar Rosen



Correlor.com







_______________________________________________

This message may contain information that is confidential or privileged. If you are not an
intended recipient of this message, please delete it and any attachments, and notify the sender
that you have received it in error. Unless specifically stated in the message or otherwise
indicated, you may not duplicate, redistribute or forward this message or any portion thereof,
including any attachments, by any means to any other person, including any retail investor
or customer. This message is not a recommendation, advice, offer or solicitation, to buy/sell
any product or service, and is not an official confirmation of any transaction. Any opinions
presented are solely those of the author and do not necessarily represent those of Barclays.

This message is subject to terms available at: www.barclays.com/emaildisclaimer and, if received
from Barclays' Sales or Trading desk, the terms available at: www.barclays.com/salesandtradingdisclaimer/.
By messaging with Barclays you consent to the foregoing. Barclays Bank PLC is a company registered
in England (number 1026167) with its registered office at 1 Churchill Place, London, E14 5HP.
This email may relate to or be sent from other members of the Barclays group.

_______________________________________________

Mime
View raw message