cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thunder Stumpges <thunder.stump...@gmail.com>
Subject Re: Help me on Cassandra Data Modelling
Date Tue, 28 Jan 2014 22:20:29 GMT
Hey Naresh,

Unfortunately I don't have any further advice. I keep feeling like you're
looking at a search problem instead of a lookup problem. Perhaps Cassandra
is not the right tool for your need in this case. Perhaps something with a
full-text index type feature would help.

Or perhaps someone more experienced than I could come up with another
design.

Good luck,
Thunder



On Tue, Jan 28, 2014 at 9:07 AM, Naresh Yadav <nyadav.ait@gmail.com> wrote:

> please inputs on last email if any..
>
>
>
> On Tue, Jan 28, 2014 at 7:18 AM, Naresh Yadav <nyadav.ait@gmail.com>wrote:
>
>> yes thunder you are right, i had simplified that by moving *tags *search(partial/exact)
>> in separate column family tagcombination which will act as index for all
>> search based on tags and in my my original metricresult table will store
>> tagcombinationid and time in columns otherwise it was getting complicated &
>> was not getting good results.
>>
>> Yes i agree with you on duplicating the storage with tagcombination
>> columnfamily...if i have billion of real tagcombinations with 8 tags in
>> each then i am duplicating 2^8 combinations for each one to support partial
>> match for that tagcombination which will make this very heavy table...with
>> individual keys i will not able to support search with set of tags
>> ......please suggest alternative solution..
>>
>> Also one of my colleague suggested a total different approach to it but i
>> am  not able to map that on cassandra.
>> Acc to him we store all possible tags in columns and for each combination
>> we just mark 0s, 1s whichever tags
>> appear in that combination...So data(TC1 as India, Pencil AND TC2 as
>> India, Pen) will be like :
>>
>>                               India        Pencil           Pen
>> TC1                          1             1                  0
>> TC2                          1              0                  1
>>
>> I am not able to design optimal column family for this in cassandra..if i
>> design as is then for search of India, Pen then i will select India, Pen
>> columns but that will touch each and every row because i am not able to
>> apply criteria of matching 1s only...i believe there can be better design
>> of this to make use of this good thought.
>>
>> Please help me on this..
>>
>> Thanks
>> Naresh
>>
>>
>>
>> On Mon, Jan 27, 2014 at 11:30 PM, Thunder Stumpges <
>> thunder.stumpges@gmail.com> wrote:
>>
>>> Hey Naresh,
>>>
>>> You asked a similar question a week or two ago. It looks like you have
>>> simplified your needs quite a bit. Were you able to adjust your
>>> requirements or separate the issue? You had a complicated time dimension
>>> before, as well as a single "query" for multiple AND cases on tags.
>>>
>>> ....
>>>> c)Give data for Metric=Sales AND Tag=U.S.A
>>>>        O/P : 5 rows
>>>> d)Give data for Metric=Sales AND Period=Jan-10 AND Tag=U.S.A AND Tag=Pen
>>>>        O/P :1 row"
>>>
>>>
>>>
>>> I agree with Jonathan on the model for this simplified use case. However
>>> looking at how you are storing each partial tag combination as well as
>>> individual tags in the partitioning key, you will be severely duplicating
>>> your storage. You might want to just store individual keys in the
>>> partitioning key.
>>>
>>> Good luck,
>>> Thunder
>>>
>>>
>>>
>>>
>>> On Mon, Jan 27, 2014 at 8:48 AM, Naresh Yadav <nyadav.ait@gmail.com>wrote:
>>>
>>>> Thanks Jonathan for guiding me..i just want to confirm my understanding
>>>> :
>>>>
>>>> create columnfamily tagcombinations {
>>>>      partialtags text,
>>>>      tagcombinationid text,
>>>>      tagcombinationtags set<tags>
>>>> Primary Key((partialtags), tagcombinationid)
>>>> }
>>>> IF i need to store TWO tagcombination TC1 as India, Pencil AND TC2 as
>>>> India, Pen then data will stored as :
>>>>
>>>>                    TC1              TC2
>>>> India          India,Pencil   India,pen
>>>>
>>>>                    TC1
>>>> Pencil      India,Pencil
>>>>
>>>>                    TC2
>>>> Pen       India,Pen
>>>>
>>>>                         TC1
>>>> India,Pencil    India,Pencil
>>>>
>>>>                           TC2
>>>> India,Pen        India, Pen
>>>>
>>>>
>>>> I hope i had understood the thought properly please confirm on design.
>>>>
>>>> Thanks
>>>> Naresh
>>>>
>>>>
>>>> On Mon, Jan 27, 2014 at 7:05 PM, Jonathan Lacefield <
>>>> jlacefield@datastax.com> wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>>   The trick with this data model is to get to partition based, and/or
>>>>> cluster based access pattern so C* returns results quickly.  In C* you
want
>>>>> to model your tables based on your query access patterns and remember
that
>>>>> writes are cheap and fast in C*.
>>>>>
>>>>>   So, try something like the following:
>>>>>
>>>>>   1 Table with a Partition Key = Tag String
>>>>>          Tag String = "Tag" or "set of Tags"
>>>>>          Cluster based on tag combination (probably desc order)
>>>>>          This will allow you to select any combination that includes
>>>>> Tag or "set of Tags"
>>>>>          This will duplicate data as you will store 1 tag combination
>>>>> in every Tag partition, i.e. if a tag combination has 2 parts, then you
>>>>> will have 2 rows
>>>>>
>>>>>   Hope this helps.
>>>>>
>>>>> Jonathan Lacefield
>>>>> Solutions Architect, DataStax
>>>>> (404) 822 3487
>>>>>  <http://www.linkedin.com/in/jlacefield>
>>>>>
>>>>>
>>>>>
>>>>> <http://www.datastax.com/what-we-offer/products-services/training/virtual-training>
>>>>>
>>>>>
>>>>> On Mon, Jan 27, 2014 at 7:24 AM, Naresh Yadav <nyadav.ait@gmail.com>wrote:
>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> Urgently need help on modelling this usecase on Cassandra.
>>>>>>
>>>>>> I have concept of tags and tagcombinations.
>>>>>> For example U.S.A and Pen are two tags AND if they come together
in
>>>>>> some definition then register a tagcombination(U.S.A-Pen) for that..
>>>>>>
>>>>>> *tags *(U.S.A, Pen, Pencil, India, Shampoo)
>>>>>> *tagcombinations*(U.S.A-Pen, India-pencil, U.S.A-Pencil, India-Pen,
>>>>>> India-Pen-Shampoo)
>>>>>>
>>>>>> - millions of tags
>>>>>> - billions of tagcombinations
>>>>>> - one tagcombination generally have 2-8 tags....
>>>>>> - Every day we get lakhs of new tagcombinations to write
>>>>>>
>>>>>> Query need to support :
>>>>>> one tag or set of tags appears in how many tagcombinationids ????
>>>>>> If i query for Pen,India then it should return two tagcombinaions
>>>>>> (India-Pen, India-Pen-Shampoo))..Query will be fired by application
in
>>>>>> realtime.
>>>>>>
>>>>>> I am new to cassandra and need to deliver fast so please give your
>>>>>> inputs.
>>>>>>
>>>>>> Thanks
>>>>>> Naresh
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>
>

Mime
View raw message