Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (athena.apache.org: domain of mishra.vivs@gmail.com
 designates 209.85.220.44 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CAHvHObkQ_ek4Pt9qcdj+OuE7rGH2MbebEmNFfxovx7Z-U2BA=A@mail.gmail.com>
References: 
 <CAHvHObkQ_ek4Pt9qcdj+OuE7rGH2MbebEmNFfxovx7Z-U2BA=A@mail.gmail.com>
Date: Thu, 27 Sep 2012 19:41:42 +0530
Message-ID: 
 <CANJo1uB2XTtHNfptmoNS1_JxS-s0Ogz5eRX2-4s9YZeK8XGOuw@mail.gmail.com>
Subject: Re:
From: Vivek Mishra <mishra.vivs@gmail.com>
To: user@cassandra.apache.org
Content-Type: multipart/alternative; boundary=f46d042e0255c0717b04caaf8446

--f46d042e0255c0717b04caaf8446
Content-Type: text/plain; charset=ISO-8859-1

1 question.
user_cook_id, user_facebook_id, user_cell_phone, user_personal_id :
Combination key of all will be unique?  Or all of them are unique
individually.?

If a combination can be unique then a having extra column(index enabled)
per row  should work for you.

-Vivek


On Thu, Sep 27, 2012 at 7:22 PM, Andre Tavares <andre271@gmail.com> wrote:

>
> Hi community,
>
> I have a question: I need to do a search on a CF that has over 200 million
> rows to find an User key.
>
> To find the user, I have 4 keys (acctualy I have 4 keys but it that can
> increase) that are: user_cook_id, user_facebook_id, user_cell_phone,
> user_personal_id
>
> If I don't find the User by the informed key I need perform another query
> passing the others existing keys to find the user.
>
> My doubt:What is the better design to mine CF to find the user over the 4
> keys?   I thought to create an CF with secondary index  like this:
>
> create column family users_test with comparator=UTF8Type and
> column_metadata=[
> {column_name: user_cook_id, validation_class: UTF8Type, index_type: KEYS},
> {column_name: user_facebook_id, validation_class: UTF8Type, index_type:
> KEYS},
> {column_name: user_cell_phone, validation_class: UTF8Type, index_type:
> KEYS},
> {column_name: user_personal_id, validation_class: UTF8Type, index_type:
> KEYS},
> {column_name: user_key, validation_class: UTF8Type, index_type: KEYS}
> ];
>
> Another approaching is creating just one column for the User CF having
> generic KEY
>
> create column family users_test with comparator=UTF8Type and
> column_metadata=[
> {column_name: generic_key, validation_class: UTF8Type, index_type: KEYS},
> {column_name: user_key, validation_class: UTF8Type, index_type: KEYS}
> ];
>
> where generic_id can be: user_cook_id value, or a user_facebook_id,
> user_cell_phone, user_personal_id values ... the "problem" of this solution
> is that I have 200 million users_id x 4 keys (user_cook_id,
> user_facebook_id, user_cell_phone, user_personal_id) = 800 million rows
>
> I ask to my friends if am I on the right way or suggestions are well come
> .. thanks
>

--f46d042e0255c0717b04caaf8446
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

1 question.<br>user_cook_id, user_facebook_id, user_cell_phone, user_person=
al_id : Combination key of all will be unique?=A0 Or all of them are unique=
 individually.?<br><br>If a combination can be unique then a having extra c=
olumn(index enabled) per row=A0 should work for you.<br>
<br>-Vivek<br>

<br><br><div class=3D"gmail_quote">On Thu, Sep 27, 2012 at 7:22 PM, Andre T=
avares <span dir=3D"ltr">&lt;<a href=3D"mailto:andre271@gmail.com" target=
=3D"_blank">andre271@gmail.com</a>&gt;</span> wrote:<br><blockquote class=
=3D"gmail_quote" style=3D"margin:0pt 0pt 0pt 0.8ex;border-left:1px solid rg=
b(204,204,204);padding-left:1ex">
<br>Hi community,<br><br>I have a question: I need to do a search on a CF t=
hat has over 200 million rows to find an User key.<br><br>To find the user,=
 I have 4 keys (acctualy I have 4 keys but it that can increase) that are: =
user_cook_id, user_facebook_id, user_cell_phone, user_personal_id<br>


<br>If I don&#39;t find the User by the informed key I need perform another=
 query passing the others existing keys to find the user.<br><br>My doubt:W=
hat is the better design to mine CF to find the user over the 4 keys?=A0=A0=
 I thought to create an CF with secondary index=A0 like this:<br>


<br>create column family users_test with comparator=3DUTF8Type and column_m=
etadata=3D[<br>{column_name: user_cook_id, validation_class: UTF8Type, inde=
x_type: KEYS},<br>{column_name: user_facebook_id, validation_class: UTF8Typ=
e, index_type: KEYS},<br>


{column_name: user_cell_phone, validation_class: UTF8Type, index_type: KEYS=
},<br>{column_name: user_personal_id, validation_class: UTF8Type, index_typ=
e: KEYS},<br>{column_name: user_key, validation_class: UTF8Type, index_type=
: KEYS}<br>


];<br><br>Another approaching is creating just one column for the User CF h=
aving generic KEY <br><br>create column family users_test with comparator=
=3DUTF8Type and column_metadata=3D[<br>{column_name: generic_key, validatio=
n_class: UTF8Type, index_type: KEYS},<br>


{column_name: user_key, validation_class: UTF8Type, index_type: KEYS}<br>];=
<br><br>where generic_id can be: user_cook_id value, or a user_facebook_id,=
 user_cell_phone, user_personal_id values ... the &quot;problem&quot; of th=
is solution is that I have 200 million users_id x 4 keys (user_cook_id, use=
r_facebook_id, user_cell_phone, user_personal_id) =3D 800 million rows<br>


<br>I ask to my friends if am I on the right way or suggestions are well co=
me .. thanks<br>
</blockquote></div><br>

--f46d042e0255c0717b04caaf8446--