if you are able to reproduce the issue, file a ticket on https://issues.apache.org/jira/browse/CASSANDRA - my experience is developers respond quickly on issues that are clearly a bug.


ondrej cernos

On Thu, Apr 25, 2013 at 10:03 AM, Tamar Rosen <tamar@correlor.com> wrote:

We have a case of a reproducible crash, probably due to out of memory, but I don't understand why. 

The installation is currently single node. 

We have a column family with approx 50000 rows. 

In cql, the CF definition is:

  user_name text PRIMARY KEY,
  big_json text,
  status int

Each big_json can have 500K or more of data.

There is also a secondary index on the status column. 
Status can have various values, over 90% of all rows have status = 2. 

Select user_name from users limit 80000;
Is pretty fast

Select user_name from users where status = 1; 
is slower, even though much less data is returned.

Select user_name from users where status = 2; 
Always crashes.

What are we doing wrong? Can it be that Cassandra is actually trying to read all the CF data rather than just the keys! (actually, it doesn't need to go to the users CF at all - all the data it needs is in the index CF)
Also, in the code I am doing the same using Astyanax index query with pagination, and the behavior is the same. 

Please help me:

1. solve the immediate issue
2. understand if there is something in this use case which indicates that we are not using Cassandra the way it is meant. 


Tamar Rosen