incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From DuyHai Doan <doanduy...@gmail.com>
Subject Re: Data model for boolean attributes
Date Sat, 22 Mar 2014 17:40:43 GMT
Ben

> When you say beware of the cardinality, do you think that the cardinality
is too low in this instance?

 Secondary indexes in C* are distributed across all the nodes containing
actual data so somehow it helps avoiding hot spots. However, since there
are only 2 values for your boolean flag, even with good distribution, all
indexed values will be concentrated on only 2 partitions (one for "true"
and one for "false") on the same node and you may run into very wide row.

 Keeping manual secondary index on this flag does not help much either,
because of the binary distribution.


 Saving data in 2 colum families is similar to having a composite partition
key (id,flag) like I mentioned before but it does not solve your
requirement to be able to paginate through all values of (id,flag). Maybe
you should create another column family with bucketing to support just this
query.

 Anyway, there is no magic with C*, the more different ways you want to
query the data, the more column families and denormalization you need

 Regards

 Duy Hai DOAN



On Sat, Mar 22, 2014 at 4:47 AM, Ben Hood <0x6e6562@gmail.com> wrote:

> Hey Duy Hai,
>
> On Fri, Mar 21, 2014 at 7:34 PM, DuyHai Doan <doanduyhai@gmail.com> wrote:
> > Your previous "select * from x where flag = true;"  translate into:
> >
> >  SELECT * FROM x WHERE id=... AND flag = true
> >
> > Of course, you'll need to provide the id in any case.
>
> This is an interesting option, though this app needs to be able to
> paginate through all values of (id,flag).
>
> In this variant I guess you could do
>
> select distinct id,flag from x
>
> to get the unique partition keys and use those to paginate through the
> column family
>
> >  If you want to query only on the boolean flag, I'm afraid that manual
> > indexing or secondary index (beware of cardinality !) are your only
> choices.
>
> When you say beware of the cardinality, do you think that the
> cardinality is too low in this instance?
>
> Also, how is secondary indexing or manual indexing better than
> maintaining a separate column family per flag (since there will only
> be two column families in this case)?
>
> Cheers,
>
> Ben
>

Mime
View raw message