Ok, so 0.6's https://issues.apache.org/jira/browse/CASSANDRA-745 permits "someone using RandomPartitioner to pass start="" and finish="" to get all of the rows in their cluster, although in an extremely inefficient way."
We are in a situation like Pierre's, where we need to know what's currently in the DB so to speak -- except that we have a hundreds of millions of rows (and increasing) and that maintaining an index of the keys in another CF, as Brandon suggests, is becoming difficult (we also don't like the double write on initial key inserts, in terms of transactionality especially).
Also, every once in a while, we need to enhance our data as part of some functionality upgrade or refactoring. So far, what we do is enhance on reads (i.e., whenever we read a particular record, see if it's not up to the latest version, and if so enhance), but there are many problems with this approach. We've been considering doing background process enhancing by running through all of the keys, which is why 745 is pretty exciting. We'd rather go through the inefficient operation once in a while as opposed to doing a check on every read.
Anyway, partially to address the efficiency concern, I've been playing around with the idea of having 745-like functionality on a per-node basis: a call to get all of the keys on a particular node as opposed to the entire cluster. It just seems like with a very large cluster with billions, tens of billions, or hundreds of billions of keys 745 would just get overwhelmed. Just a thought.
On Tue, Feb 2, 2010 at 7:31 AM, Jonathan Ellis <firstname.lastname@example.org
> More or less (but see
, in 0.6).
> Think of it this way: when you have a few billion keys, how useful is
> it to list them?
> 2010/2/2 Sébastien Pierre <email@example.com
> > Hi all,
> > I would like to know how to retrieve the list of available keys available
> > for a specific column. There is the get_key_range method, but it is only
> > available when using the OrderPreservingPartitioner -- I use a
> > RandomPartitioner.
> > Does this mean that when using a RandomPartitioner, you cannot see which
> > keys are available in the database ?
> > -- Sébastien
This email and any files transmitted with it are confidential and intended solely for the use of the individual to whom they are addressed. If you have received this email in error please notify the system manager. This message contains confidential information and is intended only for the individual named. If you are not the named addressee you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. If you are not the intended recipient you are notified that disclosing, copying, distributing or taking any action in reliance on the contents of this information is strictly prohibited.