Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (athena.apache.org: local policy)
From: "Hiller, Dean" <Dean.Hiller@nrel.gov>
To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Date: Mon, 24 Sep 2012 09:24:13 -0600
Subject: Re: Correct model
Thread-Topic: Correct model
Thread-Index: Ac2aaKaEkK8dWMvJTj6kMNUmoVajIw==
Message-ID: <CC85D002.11A37%Dean.Hiller@nrel.gov>
In-Reply-To: 
 <CABKQidsDs+uEinFrypepfVjNV_1VRr-gg0ZV_zGt=roBBDxtFQ@mail.gmail.com>
Accept-Language: en-US
Content-Language: en-US
user-agent: Microsoft-MacOutlook/14.2.3.120616
acceptlanguage: en-US
Content-Type: text/plain; charset="Windows-1252"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0

I am confused.  In this email you say you want "get all requests for a user=
" and in a previous one you said "Select all the users which has new reques=
ts, since date D" so let me answer both=85

For latter, you make ONE query into the latest partition(ONE partition) of =
the GlobalRequestsCF which gives you the most recent requests ALONG with th=
e user ids of those requests.  If you queried all partitions, you would mos=
t likely blow out your JVM memory.

For the former, you make ONE query to the UserRequestsCF with userid =3D <y=
our user id> to get all the requests for that user

You mean too many rows, not a row too long, right? I am assuming each reque=
st will be a different row, not a new column. Is having billions of ROWS so=
mething non performatic in Cassandra? I know Cassandra allows up to 2 billi=
on columns for a CF, but I am not aware of a limitation for rows=85

Sorry, I was skipping some context.  A lot of the backing indexing sometime=
s is done as a long row so in playOrm, too many rows in a partition means =
=3D=3D too many columns in the indexing row for that partition.  I believe =
the same is true in cassandra for their indexing.

If I understood it correctly, if I don't specify partitions, Cassandra will=
 store all my data in a single node?

Cassandra spreads all your data out on all nodes with or without partitions=
.  A single partition does have it's data co-located though.

I 99,999% of my users will have less than 100k requests, would it make sens=
e to partition by user?

If you are at 100k(and the requests are rather small), you could embed all =
the requests in the user or go with Aaron's below suggestion of a UserReque=
stsCF.  If your requests are rather large, you probably don't want to embed=
 them in the User.  Either way, it's one query or one row key lookup.

That's cool! :D So if I need to query data split in 10 partitions, for inst=
ance, I can perform the query in parallel by using a multiget, right?

Multiget ignores partitions=85you feed it a LIST of keys and it gets them. =
 It just so happens that partitionId had to be part of your row key.

Out of curiosity, if each get will occur on a different node, I would need =
to connect to each of the nodes? Or would I query 1 node and it would commu=
nicate to others?

I have used Hector and now use Astyanax, I don't worry much about that laye=
r, but I feed astyanax 3 nodes and I believe it discovers some of the other=
 ones.  I believe the latter is true but am not 100% sure as I have not loo=
ked at that code.

As an analogy on the above, if you happen to have used PlayOrm, you would O=
NLY need one Requests table and you partition by user AND time(two views in=
to the same data partitioned two different ways) and you can do exactly the=
 same thing as Aaron's example.  PlayOrm doesn't embed the partition ids in=
 the key leaving it free to partition twice like in your case=85.and in a r=
efactor, you have to map/reduce A LOT more rows because of rows having the =
FK of <partitionid><subrowkey> whereas if you don't have partition id in th=
e key, you only map/reduce the partitioned table in a redesign/refactor.  T=
hat said, we will be adding support for CQL partitioning in addition to Pla=
yOrm partitioning even though it can be a little less flexible sometimes.

Also, CQL locates all the data on one node for a partition.  We have found =
it can be faster "sometimes" with the parallelized disks that the partition=
s are NOT all on one node so PlayOrm partitions are virtual only and do not=
 relate to where the rows are stored.  An example on our 6 nodes was a join=
 query on a partition with 1,000,000 rows took 60ms (of course I can't comp=
are to CQL here since it doesn't do joins).  It really depends how much dat=
a is going to come back in the query though too?  There are tradeoff's betw=
een disk parallel nodes and having your data all on one node of course.

Later,
Dean


From: Marcelo Elias Del Valle <mvallebr@gmail.com<mailto:mvallebr@gmail.com=
>>
Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" <us=
er@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Date: Monday, September 24, 2012 7:45 AM
To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" <user@cas=
sandra.apache.org<mailto:user@cassandra.apache.org>>
Subject: Re: Correct model


2012/9/23 Hiller, Dean <Dean.Hiller@nrel.gov<mailto:Dean.Hiller@nrel.gov>>
You need to split data among partitions or your query won't scale as more a=
nd more data is added to table.  Having the partition means you are queryin=
g a lot less rows.
This will happen in case I can query just one partition. But if I need to q=
uery things in multiple partitions, wouldn't it be slower?

He means determine the ONE partition key and query that partition.  Ie. If =
you want just latest user requests, figure out the partition key based on w=
hich month you are in and query it.  If you want the latest independent of =
user, query the correct single partition for GlobalRequests CF.

But in this case, I didn't understand Aaron's model then. My first query is=
 to get  all requests for a user. If I did partitions by time, I will need =
to query all partitions to get the results, right? In his answer it was sai=
d I would query ONE partition...

If I want all the requests for the user, couldn't I just select all UserReq=
uest records which start with "userId"?
He designed it so the user requests table was completely scalable so he has=
 partitions there.  If you don't have partitions, you could run into a row =
that is toooo long.  You don't need to design it this way if you know none =
of your users are going to go into the millions as far as number of request=
s.  In his design then, you need to pick the correct partition and query in=
to that partition.
You mean too many rows, not a row too long, right? I am assuming each reque=
st will be a different row, not a new column. Is having billions of ROWS so=
mething non performatic in Cassandra? I know Cassandra allows up to 2 billi=
on columns for a CF, but I am not aware of a limitation for rows...

I really didn't understand why to use partitions.
Partitions are a way if you know your rows will go into the trillions of br=
eaking them up so each partition has 100k rows or so or even 1 million but =
maxes out in the millions most likely.  Without partitions, you hit a limit=
 in the millions.  With partitions, you can keep scaling past that as you c=
an have as many partitions as you want.

If I understood it correctly, if I don't specify partitions, Cassandra will=
 store all my data in a single node? I thought Cassandra would automaticall=
y distribute my data among nodes as I insert rows into a CF. Of course if I=
 use partitions I understand I could query just one partition (node) to get=
 the data, if I have the partition field, but to the best of my knowledge, =
this is not what happens in my case, right? In the first query I would have=
 to query all the partitions...
Or you are saying partitions have nothing to do with nodes?? I 99,999% of m=
y users will have less than 100k requests, would it make sense to partition=
 by user?

A multi-get is a query that finds IN PARALLEL all the rows with the matchin=
g keys you send to cassandra.  If you do 1000 gets(instead of a multi-get) =
with 1ms latency, you will find, it takes 1 second+processing time.  If you=
 do ONE multi-get, you only have 1 request and therefore 1ms latency.  The =
other solution is you could send 1000 "asycnh" gets but I have a feeling th=
at would be slower with all the marshalling/unmarshalling of the envelope=
=85..really depends on the envelope size like if we were using http, you wo=
uld get killed doing 1000 requests instead of 1 with 1000 keys in it.
That's cool! :D So if I need to query data split in 10 partitions, for inst=
ance, I can perform the query in parallel by using a multiget, right? Out o=
f curiosity, if each get will occur on a different node, I would need to co=
nnect to each of the nodes? Or would I query 1 node and it would communicat=
e to others?


Later,
Dean

From: Marcelo Elias Del Valle <mvallebr@gmail.com<mailto:mvallebr@gmail.com=
><mailto:mvallebr@gmail.com<mailto:mvallebr@gmail.com>>>
Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org><mail=
to:user@cassandra.apache.org<mailto:user@cassandra.apache.org>>" <user@cass=
andra.apache.org<mailto:user@cassandra.apache.org><mailto:user@cassandra.ap=
ache.org<mailto:user@cassandra.apache.org>>>
Date: Sunday, September 23, 2012 10:23 AM
To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org><mailto:use=
r@cassandra.apache.org<mailto:user@cassandra.apache.org>>" <user@cassandra.=
apache.org<mailto:user@cassandra.apache.org><mailto:user@cassandra.apache.o=
rg<mailto:user@cassandra.apache.org>>>
Subject: Re: Correct model


2012/9/20 aaron morton <aaron@thelastpickle.com<mailto:aaron@thelastpickle.=
com><mailto:aaron@thelastpickle.com<mailto:aaron@thelastpickle.com>>>
I would consider:

# User CF
* row_key: user_id
* columns: user properties, key=3Dvalue

# UserRequests CF
* row_key: <user_id : partition_start> where partition_start is the start o=
f a time partition that makes sense in your domain. e.g. partition monthly.=
 Generally want to avoid rows the grow forever, as a rule of thumb avoid ro=
ws more than a few 10's of MB.
* columns: two possible approaches:
1) If the requests are immutable and you generally want all of the data sto=
re the request in a single column using JSON or similar, with the column na=
me a timestamp.
2) Otherwise use a composite column name of <timestamp : request_property> =
to store the request in many columns.
* In either case consider using Reversed comparators so the most recent col=
umns are first  see http://thelastpickle.com/2011/10/03/Reverse-Comparators=
/

# GlobalRequests CF
* row_key: partition_start - time partition as above. It may be easier to u=
se the same partition scheme.
* column name: <timestamp : user_id>
* column value: empty

Ok, I think I understood your suggestion... But the only advantage in this =
solution is to split data among partitions? I understood how it would work,=
 but I didn't understand why it's better than the other solution, without t=
he GlobalRequests CF

- Select all the requests for an user
Work out the current partition client side, get the first N columns. Then p=
age.

What do you mean here by current partition? You mean I would perform a quer=
y for each particition? If I want all the requests for the user, couldn't I=
 just select all UserRequest records which start with "userId"? I might be =
missing something here, but in my understanding if I use hector to query a =
column familly I can do that and Cassandra servers will automatically commu=
nicate to each other to get the data I need, right? Is it bad? I really did=
n't understand why to use partitions.


- Select all the users which has new requests, since date D
Worm out the current partition client side, get the first N columns from Gl=
obalRequests, make a multi get call to UserRequests
NOTE: Assuming the size of the global requests space is not huge.
Hope that helps.
 For sure it is helping a lot. However, I don't know what is a multiget... =
I saw the hector api reference and found this method, but not sure about wh=
at Cassandra would do internally if I do a multiget... Is this expensive in=
 terms of performance and latency?

--
Marcelo Elias Del Valle
http://mvalle.com - @mvallebr


--
Marcelo Elias Del Valle
http://mvalle.com - @mvallebr