incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From James Rothering <jrother...@codojo.me>
Subject Re: Data model for boolean attributes
Date Sun, 23 Mar 2014 03:36:19 GMT
Hi Duy:

The compound partition key seems perfect, but you say that pagination isn't
possible with it: why is that?

Regards,

James


On Sat, Mar 22, 2014 at 10:40 AM, DuyHai Doan <doanduyhai@gmail.com> wrote:

> Ben
>
>
> > When you say beware of the cardinality, do you think that the
> cardinality is too low in this instance?
>
>  Secondary indexes in C* are distributed across all the nodes containing
> actual data so somehow it helps avoiding hot spots. However, since there
> are only 2 values for your boolean flag, even with good distribution, all
> indexed values will be concentrated on only 2 partitions (one for "true"
> and one for "false") on the same node and you may run into very wide row.
>
>  Keeping manual secondary index on this flag does not help much either,
> because of the binary distribution.
>
>
>  Saving data in 2 colum families is similar to having a composite
> partition key (id,flag) like I mentioned before but it does not solve your
> requirement to be able to paginate through all values of (id,flag). Maybe
> you should create another column family with bucketing to support just this
> query.
>
>  Anyway, there is no magic with C*, the more different ways you want to
> query the data, the more column families and denormalization you need
>
>  Regards
>
>  Duy Hai DOAN
>
>
>
> On Sat, Mar 22, 2014 at 4:47 AM, Ben Hood <0x6e6562@gmail.com> wrote:
>
>> Hey Duy Hai,
>>
>> On Fri, Mar 21, 2014 at 7:34 PM, DuyHai Doan <doanduyhai@gmail.com>
>> wrote:
>> > Your previous "select * from x where flag = true;"  translate into:
>> >
>> >  SELECT * FROM x WHERE id=... AND flag = true
>> >
>> > Of course, you'll need to provide the id in any case.
>>
>> This is an interesting option, though this app needs to be able to
>> paginate through all values of (id,flag).
>>
>> In this variant I guess you could do
>>
>> select distinct id,flag from x
>>
>> to get the unique partition keys and use those to paginate through the
>> column family
>>
>> >  If you want to query only on the boolean flag, I'm afraid that manual
>> > indexing or secondary index (beware of cardinality !) are your only
>> choices.
>>
>> When you say beware of the cardinality, do you think that the
>> cardinality is too low in this instance?
>>
>> Also, how is secondary indexing or manual indexing better than
>> maintaining a separate column family per flag (since there will only
>> be two column families in this case)?
>>
>> Cheers,
>>
>> Ben
>>
>
>

Mime
View raw message