Return-Path: Delivered-To: apmail-incubator-cassandra-user-archive@minotaur.apache.org Received: (qmail 90722 invoked from network); 2 Feb 2010 18:51:51 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 2 Feb 2010 18:51:51 -0000 Received: (qmail 87871 invoked by uid 500); 2 Feb 2010 18:51:51 -0000 Delivered-To: apmail-incubator-cassandra-user-archive@incubator.apache.org Received: (qmail 87831 invoked by uid 500); 2 Feb 2010 18:51:51 -0000 Mailing-List: contact cassandra-user-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: cassandra-user@incubator.apache.org Delivered-To: mailing list cassandra-user@incubator.apache.org Received: (qmail 87822 invoked by uid 99); 2 Feb 2010 18:51:51 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 02 Feb 2010 18:51:51 +0000 X-ASF-Spam-Status: No, hits=3.4 required=10.0 tests=HTML_MESSAGE,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [209.85.219.212] (HELO mail-ew0-f212.google.com) (209.85.219.212) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 02 Feb 2010 18:51:44 +0000 Received: by ewy4 with SMTP id 4so41665ewy.27 for ; Tue, 02 Feb 2010 10:51:22 -0800 (PST) MIME-Version: 1.0 Received: by 10.213.8.6 with SMTP id f6mr1108859ebf.93.1265136681887; Tue, 02 Feb 2010 10:51:21 -0800 (PST) In-Reply-To: References: <91790a981002020721m573edb5cqff9a4c859990a5a@mail.gmail.com> Date: Tue, 2 Feb 2010 10:51:21 -0800 Message-ID: <36e83ac61002021051o1306403cm71aa45d9c73a3985@mail.gmail.com> Subject: Re: How to retrieve keys from Cassandra ? From: Jean-Denis Greze To: cassandra-user@incubator.apache.org Content-Type: multipart/alternative; boundary=0015174c0f7c842855047ea296d7 --0015174c0f7c842855047ea296d7 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Ok, so 0.6's https://issues.apache.org/jira/browse/CASSANDRA-745 permits "someone using RandomPartitioner to pass start=3D"" and finish=3D"" to get = all of the rows in their cluster, although in an extremely inefficient way." We are in a situation like Pierre's, where we need to know what's currently in the DB so to speak -- except that we have a hundreds of millions of rows (and increasing) and that maintaining an index of the keys in another CF, a= s Brandon suggests, is becoming difficult (we also don't like the double writ= e on initial key inserts, in terms of transactionality especially). Also, every once in a while, we need to enhance our data as part of some functionality upgrade or refactoring. So far, what we do is enhance on reads (i.e., whenever we read a particular record, see if it's not up to th= e latest version, and if so enhance), but there are many problems with this approach. We've been considering doing background process enhancing by running through all of the keys, which is why 745 is pretty exciting. We'd rather go through the inefficient operation once in a while as opposed to doing a check on every read. Anyway, partially to address the efficiency concern, I've been playing around with the idea of having 745-like functionality on a per-node basis: = a call to get all of the keys on a particular node as opposed to the entire cluster. It just seems like with a very large cluster with billions, tens of billions, or hundreds of billions of keys 745 would just get overwhelmed= . Just a thought. On Tue, Feb 2, 2010 at 7:31 AM, Jonathan Ellis wrote: > > More or less (but see > https://issues.apache.org/jira/browse/CASSANDRA-745, in 0.6). > > Think of it this way: when you have a few billion keys, how useful is > it to list them? > > -Jonathan > > 2010/2/2 S=E9bastien Pierre : > > Hi all, > > I would like to know how to retrieve the list of available keys available > > for a specific column. There is the get_key_range method, but it is onl= y > > available when using the OrderPreservingPartitioner -- I use a > > RandomPartitioner. > > Does this mean that when using a RandomPartitioner, you cannot see whic= h > > keys are available in the database ? > > -- S=E9bastien -- jeandenis@6coders.com (917) 951-0636 This email and any files transmitted with it are confidential and intended solely for the use of the individual to whom they are addressed. If you hav= e received this email in error please notify the system manager. This message contains confidential information and is intended only for the individual named. If you are not the named addressee you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. If you are not the intended recipient you are notified that disclosing, copying, distributing or taking any action in reliance on the contents of this information is strictly prohibited. --0015174c0f7c842855047ea296d7 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Ok, so 0.6's https://issues.apache.org/jira/browse/CASSANDRA-745 permits "= ;someone using RandomPartitioner to pass start=3D"" and finish=3D= "" to get all of the rows in their cluster, although in an extrem= ely inefficient way."

We are in a situation like Pierre's, where we need to kn= ow what's currently in the DB so to speak -- except that we have a hund= reds of millions of rows (and increasing) and that maintaining an index of the keys in another CF, as Bra= ndon suggests, is becoming difficult (we also don't like the double wri= te on initial key inserts, in terms of transactionality especially).=

Also, every once in a while, we n= eed to enhance our data as part of some functionality upgrade or refactorin= g. =A0So far, what we do is enhance on reads (i.e., whenever we read a part= icular record, see if it's not up to the latest version, and if so=A0en= hance), but there are many problems with this approach. We've been cons= idering doing background process enhancing by running through all of the ke= ys, which is why 745 is pretty exciting. =A0We'd rather go through the = inefficient operation once in a while as opposed to doing a check on every = read.

<= span class=3D"Apple-style-span" style=3D"border-collapse: collapse;">Anyway= , partially to address the efficiency concern, I've been playing around= with the idea of having 745-like functionality on a per-node basis: a call= to get all of the keys on a particular node as opposed to the entire clust= er. =A0It just seems like with a very large cluster with billions, tens of = billions, or hundreds of billions of keys 745 would just get overwhelmed. = =A0Just a thought.

<= span class=3D"Apple-style-span" style=3D"border-collapse: collapse;">





On Tue, F= eb 2, 2010 at 7:31 AM, Jonathan Ellis <jbellis@gmail.com> wrote:
>
> More or less (but see
> https://issues.apache.org/jira/browse/CASSA= NDRA-745, in 0.6).
>
> Think of it this way: when you have = a few billion keys, how useful is
> it to list them?
>
> -Jonathan
>
> 2010/2/2 S= =E9bastien Pierre <sebasti= en.pierre@gmail.com>:
> > Hi all,
> > I would like= to know how to retrieve the list of available keys available
> > for a specific column. There is the get_key_range method, but it = is only
> > available when using the OrderPreservingPartitioner --= I use a
> > RandomPartitioner.
> > Does this mean that w= hen using a RandomPartitioner, you cannot see which
> > keys are available in the database ?
> > =A0-- S=E9basti= en



--
jeandenis@= 6coders.com
(917) 951-0636

This email and any files transmitt= ed with it are confidential and intended solely for the use of the individu= al to whom they are addressed. If you have received this email in error ple= ase notify the system manager. This message contains confidential informati= on and is intended only for the individual named. If you are not the named = addressee you should not disseminate, distribute or copy this e-mail. Pleas= e notify the sender immediately by e-mail if you have received this e-mail = by mistake and delete this e-mail from your system. If you are not the inte= nded recipient you are notified that disclosing, copying, distributing or t= aking any action in reliance on the contents of this information is strictl= y prohibited.

--0015174c0f7c842855047ea296d7--