cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From DuyHai Doan <doanduy...@gmail.com>
Subject Re: Summarizing Timestamp datatype
Date Wed, 18 Jun 2014 06:54:34 GMT
Hello Jason

If you want to check for presence / absence of data for a day, you can add
the date as a composite component to your partition key. Cassandra will
rely on the bloom filter and avoid hitting disk for maximum performance.

The only drawback of this modelling is that you need to provide the date
each time you query your data
Le 18 juin 2014 04:22, "Jason Lewis" <jlewis@packetnexus.com> a écrit :

> That's how my schema is built. So far, I'm pulling the data out by a
> range of 30 days.  I want to see if I have data for every day, just
> wondering if it's possible in the CQL, as opposed to how i'm doing it
> now, in python.
>
> On Tue, Jun 17, 2014 at 9:46 PM, Laing, Michael
> <michael.laing@nytimes.com> wrote:
> > If you can arrange to index your rows by:
> >
> > (<something else>, <your timestamp>)
> >
> > Then you can select ranges as you wish.
> >
> > This works because <something else> is the "partition key", arrived at by
> > hash (really it's a hash key), whereas <your timestamp> is the
> "clustering
> > key" (really it is a range key) which is kept in sorted order both in
> memory
> > and on disk.
> >
> > If you don't have too many rows, <something else> can be a constant.
> >
> > If you want to avoid hot spots, and/or have more rows, then <something
> else>
> > can be a shard, e.g. an int from 0 to 11. Then you can use IN to select,
> > plus your range, and it works very nicely in practice (in my experience)
> > despite being considered by some as an anti-pattern.
> >
> > ml
> >
> >
> > On Tue, Jun 17, 2014 at 8:41 PM, Jason Lewis <jlewis@packetnexus.com>
> wrote:
> >>
> >> I have data stored with the timestamp datatype. Is it possible to use
> >> CQL to return results based on if a row falls in a range for a day?
> >>
> >> Ex. If I have 20 rows that occur on 2014-06-10, no rows for 2014-06-11
> >> and 15 rows that occured on 2014-06-12, I'd like to only return
> >> results that data exists for 2014-06-10 and 2014-06-12.
> >>
> >> Does that makes sense?  Is it possible?  CQL doesn't seem super
> >> flexible and I haven't had luck coming up with a CQL solution.
> >>
> >> jas
> >
> >
>

Mime
View raw message