incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hiller, Dean" <Dean.Hil...@nrel.gov>
Subject Re: Why Cassandra secondary indexes are so slow on just 350k rows?
Date Thu, 30 Aug 2012 20:29:30 GMT
It seems to me you may want to revisit the design(but not 100% sure as I am not sure I understand
the entire context) a bit as I could see having partitions and a few clients that poll in
each partition so you can scale to infinity basically with no issues.  If you are doing all
this polling from one machine, it just won't scale very well.

playOrm does this for you but the basic pattern you can do yourself without playOrm would
be….

Row 1
Row 2
Row 3
Row 4

Index row for partition 1 - <val>.row1, <val>.row4
Index row for partition 2 - <val>.row2, <val>.row3
…

Now each server is responsible for polling / scanning it's partitions index rows above.  If
you have 2 servers and 2 partitions, each one would column scan the above index rows and then
lookup the actual rows.  If it is unbalanced like 5 severs and 28 partitions, you can use
hash code of partition of course and number of servers to figure out if server owns that partition
are not for polling.

All of this is automatic in playOrm with S-JQL (Scalable-JQL – one minor change to SQL to
make it scalable).

Later,
Dean



From: Edward Kibardin <infalco@gmail.com<mailto:infalco@gmail.com>>
Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" <user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Date: Thursday, August 30, 2012 2:14 PM
To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" <user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Subject: Re: Why Cassandra secondary indexes are so slow on just 350k rows?

t should not depend on number of rows in CF but from number of rows per one index value

Mime
View raw message