incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vivek Mishra <mishra.v...@gmail.com>
Subject Re:
Date Thu, 27 Sep 2012 14:57:23 GMT
So it means going by secondary index way, still you  can hold unique
combination key per row. If any of these keys are not present then it will
not be part of that combination key. and everytime you will get a unique
value for each row. That can definitly avoid duplicate rows.

Or even you can make that combination key as a row key as well.

That can be one of the alternatives, still i think there are ways can be
worked out.


-Vivek

On Thu, Sep 27, 2012 at 7:51 PM, Andre Tavares <andre271@gmail.com> wrote:

> user_cook_id, user_facebook_id, user_cell_phone, user_personal_id :
> Combination key of all will be unique?  Or all of them are unique
> individually.?
>
> Combination key of all will be unique?
>
> no ...
>
>
>
> Or all of them are unique individually.?
>  yes ... all them are unique individually
>
>
>
> 2012/9/27 Vivek Mishra <mishra.vivs@gmail.com>
>
>> 1 question.
>> user_cook_id, user_facebook_id, user_cell_phone, user_personal_id :
>> Combination key of all will be unique?  Or all of them are unique
>> individually.?
>>
>> If a combination can be unique then a having extra column(index enabled)
>> per row  should work for you.
>>
>> -Vivek
>>
>>
>>
>> On Thu, Sep 27, 2012 at 7:22 PM, Andre Tavares <andre271@gmail.com>wrote:
>>
>>>
>>> Hi community,
>>>
>>> I have a question: I need to do a search on a CF that has over 200
>>> million rows to find an User key.
>>>
>>> To find the user, I have 4 keys (acctualy I have 4 keys but it that can
>>> increase) that are: user_cook_id, user_facebook_id, user_cell_phone,
>>> user_personal_id
>>>
>>> If I don't find the User by the informed key I need perform another
>>> query passing the others existing keys to find the user.
>>>
>>> My doubt:What is the better design to mine CF to find the user over the
>>> 4 keys?   I thought to create an CF with secondary index  like this:
>>>
>>> create column family users_test with comparator=UTF8Type and
>>> column_metadata=[
>>> {column_name: user_cook_id, validation_class: UTF8Type, index_type:
>>> KEYS},
>>> {column_name: user_facebook_id, validation_class: UTF8Type, index_type:
>>> KEYS},
>>> {column_name: user_cell_phone, validation_class: UTF8Type, index_type:
>>> KEYS},
>>> {column_name: user_personal_id, validation_class: UTF8Type, index_type:
>>> KEYS},
>>> {column_name: user_key, validation_class: UTF8Type, index_type: KEYS}
>>> ];
>>>
>>> Another approaching is creating just one column for the User CF having
>>> generic KEY
>>>
>>> create column family users_test with comparator=UTF8Type and
>>> column_metadata=[
>>> {column_name: generic_key, validation_class: UTF8Type, index_type: KEYS},
>>> {column_name: user_key, validation_class: UTF8Type, index_type: KEYS}
>>> ];
>>>
>>> where generic_id can be: user_cook_id value, or a user_facebook_id,
>>> user_cell_phone, user_personal_id values ... the "problem" of this solution
>>> is that I have 200 million users_id x 4 keys (user_cook_id,
>>> user_facebook_id, user_cell_phone, user_personal_id) = 800 million rows
>>>
>>> I ask to my friends if am I on the right way or suggestions are well
>>> come .. thanks
>>>
>>
>>
>

Mime
View raw message