incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Denis Greze <jeande...@6coders.com>
Subject Re: How to retrieve keys from Cassandra ?
Date Tue, 02 Feb 2010 18:51:21 GMT
Ok, so 0.6's https://issues.apache.org/jira/browse/CASSANDRA-745 permits
"someone using RandomPartitioner to pass start="" and finish="" to get all
of the rows in their cluster, although in an extremely inefficient way."

We are in a situation like Pierre's, where we need to know what's currently
in the DB so to speak -- except that we have a hundreds of millions of rows
(and increasing) and that maintaining an index of the keys in another CF, as
Brandon suggests, is becoming difficult (we also don't like the double write
on initial key inserts, in terms of transactionality especially).

Also, every once in a while, we need to enhance our data as part of some
functionality upgrade or refactoring.  So far, what we do is enhance on
reads (i.e., whenever we read a particular record, see if it's not up to the
latest version, and if so enhance), but there are many problems with this
approach. We've been considering doing background process enhancing by
running through all of the keys, which is why 745 is pretty exciting.  We'd
rather go through the inefficient operation once in a while as opposed to
doing a check on every read.

Anyway, partially to address the efficiency concern, I've been playing
around with the idea of having 745-like functionality on a per-node basis: a
call to get all of the keys on a particular node as opposed to the entire
cluster.  It just seems like with a very large cluster with billions, tens
of billions, or hundreds of billions of keys 745 would just get overwhelmed.
 Just a thought.







On Tue, Feb 2, 2010 at 7:31 AM, Jonathan Ellis <jbellis@gmail.com> wrote:
>
> More or less (but see
> https://issues.apache.org/jira/browse/CASSANDRA-745, in 0.6).
>
> Think of it this way: when you have a few billion keys, how useful is
> it to list them?
>
> -Jonathan
>
> 2010/2/2 S├ębastien Pierre <sebastien.pierre@gmail.com>:
> > Hi all,
> > I would like to know how to retrieve the list of available keys
available
> > for a specific column. There is the get_key_range method, but it is only
> > available when using the OrderPreservingPartitioner -- I use a
> > RandomPartitioner.
> > Does this mean that when using a RandomPartitioner, you cannot see which
> > keys are available in the database ?
> >  -- S├ębastien



--
jeandenis@6coders.com
(917) 951-0636

This email and any files transmitted with it are confidential and intended
solely for the use of the individual to whom they are addressed. If you have
received this email in error please notify the system manager. This message
contains confidential information and is intended only for the individual
named. If you are not the named addressee you should not disseminate,
distribute or copy this e-mail. Please notify the sender immediately by
e-mail if you have received this e-mail by mistake and delete this e-mail
from your system. If you are not the intended recipient you are notified
that disclosing, copying, distributing or taking any action in reliance on
the contents of this information is strictly prohibited.

Mime
View raw message