Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (athena.apache.org: domain of cjbottaro@academicworks.com
 designates 209.85.128.170 as permitted sender)
MIME-Version: 1.0
Date: Wed, 19 Jun 2013 12:05:36 -0500
Message-ID: 
 <CAAw6nKsqhMs0cOJyocAyOci6KRdhER9W6i3Sx8JEet_En==inw@mail.gmail.com>
Subject: Date range queries
From: "Christopher J. Bottaro" <cjbottaro@academicworks.com>
To: user@cassandra.apache.org
Content-Type: multipart/alternative; boundary=047d7bacbce8a1884a04df84d6f4

--047d7bacbce8a1884a04df84d6f4
Content-Type: text/plain; charset=ISO-8859-1

Hello,

We are considering using Cassandra and I want to make sure our use case
fits Cassandra's strengths.  We have the table like:

answers
-------
user_id | question_id | result | created_at

Where our most common query will be something like:

SELECT * FROM answers WHERE user_id = 123 AND created_at > '01/01/2012' AND
created_at < '01/01/2013'

Sometimes we will also limit by a question_id or a list of question_ids.

Secondary indexes will be created on user_id and question_id.  We expect
the upper bound of number of answers for a given user to be around 10,000.

Now my understanding of how Cassandra will run the aforementioned query is
that it will load all the answers for a given user into memory using the
secondary index, then scan over that set filtering based on the dates.

Considering that that will be our most used query and it will happen very
often, is this a bad use case for Cassandra?

Thanks for the help.

--047d7bacbce8a1884a04df84d6f4
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Hello,<div><br></div><div>We are considering using Cassand=
ra and I want to make sure our use case fits Cassandra&#39;s strengths. =A0=
We have the table like:</div><div><br></div><div style><font face=3D"courie=
r new, monospace">answers</font></div>
<div style><font face=3D"courier new, monospace">-------</font></div><div s=
tyle><font face=3D"courier new, monospace">user_id | question_id | result |=
 created_at</font></div><div style><br></div><div style>Where our most comm=
on query will be something like:</div>
<div style><br></div><div style><font face=3D"courier new, monospace">SELEC=
T * FROM answers WHERE user_id =3D 123 AND created_at &gt; &#39;01/01/2012&=
#39; AND created_at &lt; &#39;01/01/2013&#39;</font></div><div style><br></=
div>
<div style>Sometimes we will also limit by a question_id or a list of quest=
ion_ids.</div><div style><br></div><div style>Secondary indexes will be cre=
ated on user_id and question_id. =A0We expect the upper bound of number of =
answers for a given user to be around 10,000.</div>
<div style><br></div><div style>Now my understanding of how Cassandra will =
run the aforementioned query is that it will load all the answers for a giv=
en user into memory using the secondary index, then scan over that set filt=
ering based on the dates.</div>
<div style><br></div><div style>Considering that that will be our most used=
 query and it will happen very often, is this a bad use case for Cassandra?=
</div><div style><br></div><div style>Thanks for the help.</div></div>

--047d7bacbce8a1884a04df84d6f4--