Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 9587AD9D0 for ; Thu, 27 Sep 2012 14:12:11 +0000 (UTC) Received: (qmail 39017 invoked by uid 500); 27 Sep 2012 14:12:09 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 38911 invoked by uid 500); 27 Sep 2012 14:12:09 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 38903 invoked by uid 99); 27 Sep 2012 14:12:09 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 27 Sep 2012 14:12:09 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=FSL_RCVD_USER,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of mishra.vivs@gmail.com designates 209.85.220.44 as permitted sender) Received: from [209.85.220.44] (HELO mail-pa0-f44.google.com) (209.85.220.44) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 27 Sep 2012 14:12:02 +0000 Received: by padfb11 with SMTP id fb11so1418763pad.31 for ; Thu, 27 Sep 2012 07:11:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=VBeO5DbkOHXhDnGGWpF3LwqMsBrxLBwyJS57Xf7fCzw=; b=o+WZhjBnGkGVEqahTLBlRA6WjzUDD9+gFxqmfjYdXIMYWeUSL85baIg+igueAezl2p qbC/j35dgNZUSN1MAw8eb7OWRRThrJ7ZmAqcxOAO0SJJCRnw59uEvDH+FDWCtHUVGc79 3YUkJ7KMEg5gyHTrOuFecaQdX8h/5wBdfuMYPibAlDCjrfaqJT8ZujcJmCfsQiZLXRPo L8wPR7lUNTmzbXk+RYg+wbRkUDMVej3TmGjkILPlfQoxDAf4jPxVGMnPlrnJvxrK77jv y93IIb6YO+kdRT+SCSzZS5HTRX2c6lbaC0RyVYVP9PdaswRns4R6MnOAvjHX6Nw/YZtF zy3g== MIME-Version: 1.0 Received: by 10.66.88.233 with SMTP id bj9mr9576102pab.72.1348755102134; Thu, 27 Sep 2012 07:11:42 -0700 (PDT) Received: by 10.66.10.71 with HTTP; Thu, 27 Sep 2012 07:11:42 -0700 (PDT) In-Reply-To: References: Date: Thu, 27 Sep 2012 19:41:42 +0530 Message-ID: Subject: Re: From: Vivek Mishra To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=f46d042e0255c0717b04caaf8446 X-Virus-Checked: Checked by ClamAV on apache.org --f46d042e0255c0717b04caaf8446 Content-Type: text/plain; charset=ISO-8859-1 1 question. user_cook_id, user_facebook_id, user_cell_phone, user_personal_id : Combination key of all will be unique? Or all of them are unique individually.? If a combination can be unique then a having extra column(index enabled) per row should work for you. -Vivek On Thu, Sep 27, 2012 at 7:22 PM, Andre Tavares wrote: > > Hi community, > > I have a question: I need to do a search on a CF that has over 200 million > rows to find an User key. > > To find the user, I have 4 keys (acctualy I have 4 keys but it that can > increase) that are: user_cook_id, user_facebook_id, user_cell_phone, > user_personal_id > > If I don't find the User by the informed key I need perform another query > passing the others existing keys to find the user. > > My doubt:What is the better design to mine CF to find the user over the 4 > keys? I thought to create an CF with secondary index like this: > > create column family users_test with comparator=UTF8Type and > column_metadata=[ > {column_name: user_cook_id, validation_class: UTF8Type, index_type: KEYS}, > {column_name: user_facebook_id, validation_class: UTF8Type, index_type: > KEYS}, > {column_name: user_cell_phone, validation_class: UTF8Type, index_type: > KEYS}, > {column_name: user_personal_id, validation_class: UTF8Type, index_type: > KEYS}, > {column_name: user_key, validation_class: UTF8Type, index_type: KEYS} > ]; > > Another approaching is creating just one column for the User CF having > generic KEY > > create column family users_test with comparator=UTF8Type and > column_metadata=[ > {column_name: generic_key, validation_class: UTF8Type, index_type: KEYS}, > {column_name: user_key, validation_class: UTF8Type, index_type: KEYS} > ]; > > where generic_id can be: user_cook_id value, or a user_facebook_id, > user_cell_phone, user_personal_id values ... the "problem" of this solution > is that I have 200 million users_id x 4 keys (user_cook_id, > user_facebook_id, user_cell_phone, user_personal_id) = 800 million rows > > I ask to my friends if am I on the right way or suggestions are well come > .. thanks > --f46d042e0255c0717b04caaf8446 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable 1 question.
user_cook_id, user_facebook_id, user_cell_phone, user_person= al_id : Combination key of all will be unique?=A0 Or all of them are unique= individually.?

If a combination can be unique then a having extra c= olumn(index enabled) per row=A0 should work for you.

-Vivek


On Thu, Sep 27, 2012 at 7:22 PM, Andre T= avares <andre271@gmail.com> wrote:

Hi community,

I have a question: I need to do a search on a CF t= hat has over 200 million rows to find an User key.

To find the user,= I have 4 keys (acctualy I have 4 keys but it that can increase) that are: = user_cook_id, user_facebook_id, user_cell_phone, user_personal_id

If I don't find the User by the informed key I need perform another= query passing the others existing keys to find the user.

My doubt:W= hat is the better design to mine CF to find the user over the 4 keys?=A0=A0= I thought to create an CF with secondary index=A0 like this:

create column family users_test with comparator=3DUTF8Type and column_m= etadata=3D[
{column_name: user_cook_id, validation_class: UTF8Type, inde= x_type: KEYS},
{column_name: user_facebook_id, validation_class: UTF8Typ= e, index_type: KEYS},
{column_name: user_cell_phone, validation_class: UTF8Type, index_type: KEYS= },
{column_name: user_personal_id, validation_class: UTF8Type, index_typ= e: KEYS},
{column_name: user_key, validation_class: UTF8Type, index_type= : KEYS}
];

Another approaching is creating just one column for the User CF h= aving generic KEY

create column family users_test with comparator= =3DUTF8Type and column_metadata=3D[
{column_name: generic_key, validatio= n_class: UTF8Type, index_type: KEYS},
{column_name: user_key, validation_class: UTF8Type, index_type: KEYS}
];=

where generic_id can be: user_cook_id value, or a user_facebook_id,= user_cell_phone, user_personal_id values ... the "problem" of th= is solution is that I have 200 million users_id x 4 keys (user_cook_id, use= r_facebook_id, user_cell_phone, user_personal_id) =3D 800 million rows

I ask to my friends if am I on the right way or suggestions are well co= me .. thanks

--f46d042e0255c0717b04caaf8446--