Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (athena.apache.org: domain of mvallebr@gmail.com designates
 209.85.217.172 as permitted sender)
MIME-Version: 1.0
In-Reply-To: <CC86FF77.11B13%Dean.Hiller@nrel.gov>
References: 
 <CABKQidtrLixcBKzFgcKN0kUGqxZZhSevOzRHFiVpD4BdW7ChVQ@mail.gmail.com>
	<CC86FF77.11B13%Dean.Hiller@nrel.gov>
Date: Tue, 25 Sep 2012 11:18:29 -0300
Message-ID: 
 <CABKQidtNXketihzGP0_LpGeqHy=EvR7X_5j+x4+YFiizRKKsMw@mail.gmail.com>
Subject: Re: Correct model
From: Marcelo Elias Del Valle <mvallebr@gmail.com>
To: user@cassandra.apache.org
Content-Type: multipart/alternative; boundary=f46d04016c2358d56204ca8761fd

--f46d04016c2358d56204ca8761fd
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: quoted-printable

Dean,

    In the playOrm data modeling, if I understood it correctly, every CF
has its own id, right? For instance, User would have its own ID, Activities
would have its own id, etc. What if I have a trillion activities? Wouldn't
be a problem to have 1 row id for each activity?
     Cassandra always indexes by row id, right? If I have too many row ids
without using composite keys, will it scale the same way? Wouldn't the time
to insert an activity be each time longer because I have too many
activities?

Best regards,
Marcelo Valle.

2012/9/25 Hiller, Dean <Dean.Hiller@nrel.gov>

> If you need anything added/fixed, just let PlayOrm know.  PlayOrm has bee=
n
> able to quickly add so far=85that may change as more and more requests co=
me
> but so far PlayOrm seems to have managed to keep up.
>
> We are using it live by the way already.  It works out very well so far
> for us (We have 5000 column families, obviously dynamically created inste=
ad
> of by hand=85a very interesting use case of cassandra).  In our live
> environment we configured astyanax with LocalQUOROM on reads AND writes s=
o
> CP style so we can afford one node out of 3 to go down but if two go down
> it stops working THOUGH there is a patch in astyanax to auto switch from
> LocalQUOROM to ONE NODE read/write when two nodes go down that we would
> like to suck in eventually so it is always live(I don't think Hector has
> that and it is a really NICE feature=85.ie fail localquorm read/write and
> then try again with consistency level of one).
>
> Later,
> Dean
>
>
> From: Marcelo Elias Del Valle <mvallebr@gmail.com<mailto:
> mvallebr@gmail.com>>
> Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" <
> user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
> Date: Monday, September 24, 2012 1:54 PM
> To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" <
> user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
> Subject: Re: Correct model
>
> Dean, this sounds like magic :D
> I don't know details about the performance on the index implementations
> you chose, but it would pay the way to use it in my case, as I don't need
> the best performance in the world when reading, but I need to assure
> scalability and have a simple model to maintain. I liked the playOrm
> concept regarding this.
> I have more doubts, but I will ask them at stack over flow from now on.
>
> 2012/9/24 Hiller, Dean <Dean.Hiller@nrel.gov<mailto:Dean.Hiller@nrel.gov>=
>
> PlayOrm will automatically create a CF to index my CF?
>
> It creates 3 CF's for all indices, IntegerIndice, DecimalIndice, and
> StringIndice such that the ad-hoc tool that is in development can display
> the indices as it knows the prefix of the composite column name is of
> Integer, Decimal or String and it knows the postfix type as well so it ca=
n
> translate back from bytes to the types and properly display in a GUI (i.e=
.
> On top of SELECT, the ad-hoc tool is adding a way to view the induce rows
> so you can check if they got corrupt or not).
>
> Will it auto-manage it, like Cassandra's secondary indexes?
>
> YES
>
> Further detail=85
>
> You annotated fields with @NoSqlIndexed and PlayOrm adds/removes from the
> index as you add/modify/remove the entity=85..a modify does a remove old =
val
> from index and insert new value into index.
>
> An example would be PlayOrm stores all long, int, short, byte in a type
> that uses the least amount of space so IF you have a long OR BigInteger
> between =96128 to 128 it only ends up storing 1 byte in cassandra(SAVING =
tons
> of space!!!).  Then if you are indexing a type that is one of those,
> PlayOrm creates a IntegerIndice table.
>
> Right now, another guy is working on playorm-server which is a webgui to
> allow ad-hoc access to all your data as well so you can ad-hoc queries to
> see data and instead of showing Hex, it shows the real values by
> translating the bytes to String for the schema portions that it is aware =
of
> that is.
>
> Later,
> Dean
>
> From: Marcelo Elias Del Valle <mvallebr@gmail.com<mailto:
> mvallebr@gmail.com><mailto:mvallebr@gmail.com<mailto:mvallebr@gmail.com>>=
>
> Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org
> ><mailto:user@cassandra.apache.org<mailto:user@cassandra.apache.org>>" <
> user@cassandra.apache.org<mailto:user@cassandra.apache.org><mailto:
> user@cassandra.apache.org<mailto:user@cassandra.apache.org>>>
> Date: Monday, September 24, 2012 12:09 PM
> To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org><mailto:
> user@cassandra.apache.org<mailto:user@cassandra.apache.org>>" <
> user@cassandra.apache.org<mailto:user@cassandra.apache.org><mailto:
> user@cassandra.apache.org<mailto:user@cassandra.apache.org>>>
> Subject: Re: Correct model
>
> Dean,
>
>     There is one last thing I would like to ask about playOrm by this
> list, the next questiosn will come by stackOverflow. Just because of the
> context, I prefer asking this here:
>      When you say playOrm indexes a table (which would be a CF behind the
> scenes), what do you mean? PlayOrm will automatically create a CF to inde=
x
> my CF? Will it auto-manage it, like Cassandra's secondary indexes?
>      In Cassandra, the application is responsible for maintaining the
> index, right? I might be wrong, but unless I am using secondary indexes I
> need to update index values manually, right?
>      I got confused when you said "PlayOrm indexes the columns you
> choose". How do I choose and what exactly it means?
>
> Best regards,
> Marcelo Valle.
>
> 2012/9/24 Hiller, Dean <Dean.Hiller@nrel.gov<mailto:Dean.Hiller@nrel.gov
> ><mailto:Dean.Hiller@nrel.gov<mailto:Dean.Hiller@nrel.gov>>>
> Oh, ok, you were talking about the wide row pattern, right?
>
> yes
>
> But playORM is compatible with Aaron's model, isn't it?
>
> Not yet, PlayOrm supports partitioning one table multiple ways as it
> indexes the columns(in your case, the userid FK column and the time colum=
n)
>
> Can I map exactly this using playORM?
>
> Not yet, but the plan is to map these typical Cassandra scenarios as well=
.
>
>  Can I ask playOrm questions in this list?
>
> The best place to ask PlayOrm questions is on stack overflow and tag with
> PlayOrm though I monitor this list and stack overflow for questions(there
> are already a few questions on stack overflow).
>
> The examples directory is empty for now, I would like to see how to set u=
p
> the connection with it.
>
> Running build or build.bat is always kept working and all 62 tests pass(o=
r
> we don't merge to master) so to see how to make a connection or run an
> example
>
>  1.  Run build.bat or build which generates parsing code
>  2.  Import into eclipse (it already has .classpath and .project for you
> already there)
>  3.  In FactorySingleton.java you can modify IN_MEMORY to CASSANDRA or no=
t
> and run any of the tests in-memory or against localhost(We run the test
> suite also against a 6 node cluster as well and all passes)
>  4.  FactorySingleton probably has the code you are looking for plus you
> need a class called nosql.Persistence or it won't scan your jar file.(cla=
ss
> file not xml file like JPA)
>
> Do you mean I need to load all the keys in memory to do a multi get?
>
> No, you batch.  I am not sure about CQL, but PlayOrm returns a Cursor not
> the results so you can loop through every key and behind the scenes it is
> doing batch requests so you can load up 100 keys and make one multi get
> request for those 100 keys and then can load up the next 100 keys, etc.
> etc. etc.  I need to look more into the apis and protocol of CQL to see i=
f
> it allows this style of batching.  PlayOrm does support this style of
> batching today.  Aaron would know if CQL does.
>
> Why did you move? Hector is being considered for being the "official"
> client for Cassandra, isn't it?
>
> At the time, I wanted the file streaming feature.  Also, Hector seemed a
> bit cumbersome as well compared to astyanax or at least if you were
> building a platform and had no use for typing the columns.  Just personal
> preference really here.
>
> I am not sure I understood this part. If I need to refactor, having the
> partition id in the key would be a bad thing? What would be the
> alternative? In my case, as I use userId : partitionId as row key, this
> might be a problem, right?
>
> PlayOrm indexes the columns you choose(ie. The ones you want to use in th=
e
> where clause) and partitions by columns you choose not based on the key s=
o
> in PlayOrm, the key is typically a TimeUUID or something cluster
> unique=85..any tables referencing that TimeUUID never have to change.  Wi=
th
> Cassandra partitioning, if you repartition that table a different way or =
go
> for some kind of major change(usually done with map/reduce), all your
> foreign keys "may" have to change=85.it really depends on the situation
> though.  Maybe you get the design right and never have to change.
>
> @NoSqlQuery(name=3D"findWithJoinQuery", query=3D"PARTITIONS t(:partId) SE=
LECT
> t FROM TABLE as t "+
> "INNER JOIN t.activityTypeInfo as i WHERE i.type =3D :type and t.numShare=
s <
> :shares"),
>
> What would happen behind the scenes when I execute this query?
>
> In this case, t or TABLE is a partitioned table since a partition is
> defined.  And t.activityTypeInfo refers to the ActivityTypeInfo table whi=
ch
> is not partitioned(AND ActivityTypeInfo won't scale to billions of rows
> because there is no partitioning but maybe you don't need it!!!).  Behind
> the scenes when you call getResult, it returns a cursor that has NOT done
> anything yet.  When you start looping through the cursor, behind the scen=
es
> it is batching requests asking for next 500 matches(configurable) so you
> never run out of memory=85.it is EXACTLY like a database cursor.  You can
> even use the cursor to show a user the first set of results and when user
> clicks next pick up right where the cursor left off (if you saved it to t=
he
> HttpSession).
>
> You can only use joins with partition keys, right?
>
> Nope, joins work on anything.  You only need to specify the partitionId
> when you have a partitioned table in the list of join tables. (That is wh=
at
> the PARTITIONS clause is for, to identify partitionId =3D what?)=85it was=
 put
> BEFORE the SQL instead of within it=85CQL took the opposite approach but
> PlayOrm can also join different partitions together as well ;) ).
>
> In this case, is partId the row id of TABLE CF?
>
> Nope, partId is one of the columns.  There is a test case on this class i=
n
> PlayOrm =85(notice the annotation NoSqlPartitionByThisField on the
> column/field in the entity)=85
>
>
> https://github.com/deanhiller/playorm/blob/master/input/javasrc/com/alvaz=
an/test/db/PartitionedSingleTrade.java
>
> PlayOrm allows partitioned tables AND non-partioned tables(non-partitione=
d
> tables won't scale but maybe you will never have that many rows).  You ca=
n
> join any two combinations(non-partitioned with partitioned, non-partition=
ed
> with non-partitioned, partition with another partition).
>
> I only prefer stackoverflow as I like referencing links/questions with
> their urls.  To reference this email is very hard later on as I have to
> find it so in general, I HATE email lists ;) but it seems cassandra prefe=
rs
> them so any questions on PlayOrm you can put there and I am not sure how
> many on this may or may not be interested so it creates less noise on thi=
s
> list too.
>
> Later,
> Dean
>
>
> From: Marcelo Elias Del Valle <mvallebr@gmail.com<mailto:
> mvallebr@gmail.com><mailto:mvallebr@gmail.com<mailto:mvallebr@gmail.com
> >><mailto:mvallebr@gmail.com<mailto:mvallebr@gmail.com><mailto:
> mvallebr@gmail.com<mailto:mvallebr@gmail.com>>>>
> Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org
> ><mailto:user@cassandra.apache.org<mailto:user@cassandra.apache.org
> >><mailto:user@cassandra.apache.org<mailto:user@cassandra.apache.org
> ><mailto:user@cassandra.apache.org<mailto:user@cassandra.apache.org>>>" <
> user@cassandra.apache.org<mailto:user@cassandra.apache.org><mailto:
> user@cassandra.apache.org<mailto:user@cassandra.apache.org>><mailto:
> user@cassandra.apache.org<mailto:user@cassandra.apache.org><mailto:
> user@cassandra.apache.org<mailto:user@cassandra.apache.org>>>>
> Date: Monday, September 24, 2012 11:07 AM
> To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org><mailto:
> user@cassandra.apache.org<mailto:user@cassandra.apache.org>><mailto:
> user@cassandra.apache.org<mailto:user@cassandra.apache.org><mailto:
> user@cassandra.apache.org<mailto:user@cassandra.apache.org>>>" <
> user@cassandra.apache.org<mailto:user@cassandra.apache.org><mailto:
> user@cassandra.apache.org<mailto:user@cassandra.apache.org>><mailto:
> user@cassandra.apache.org<mailto:user@cassandra.apache.org><mailto:
> user@cassandra.apache.org<mailto:user@cassandra.apache.org>>>>
> Subject: Re: Correct model
>
>
>
> 2012/9/24 Hiller, Dean <Dean.Hiller@nrel.gov<mailto:Dean.Hiller@nrel.gov
> ><mailto:Dean.Hiller@nrel.gov<mailto:Dean.Hiller@nrel.gov>><mailto:
> Dean.Hiller@nrel.gov<mailto:Dean.Hiller@nrel.gov><mailto:
> Dean.Hiller@nrel.gov<mailto:Dean.Hiller@nrel.gov>>>>
> I am confused.  In this email you say you want "get all requests for a
> user" and in a previous one you said "Select all the users which has new
> requests, since date D" so let me answer both=85
>
> I have both needs. These are the two queries I need to perform on the
> model.
>
> For latter, you make ONE query into the latest partition(ONE partition) o=
f
> the GlobalRequestsCF which gives you the most recent requests ALONG with
> the user ids of those requests.  If you queried all partitions, you would
> most likely blow out your JVM memory.
>
> For the former, you make ONE query to the UserRequestsCF with userid =3D
> <your user id> to get all the requests for that user
>
> Now I think I got the main idea! This answered a lot!
>
> Sorry, I was skipping some context.  A lot of the backing indexing
> sometimes is done as a long row so in playOrm, too many rows in a partiti=
on
> means =3D=3D too many columns in the indexing row for that partition.  I
> believe the same is true in cassandra for their indexing.
>
> Oh, ok, you were talking about the wide row pattern, right? But playORM i=
s
> compatible with Aaron's model, isn't it? Can I map exactly this using
> playORM? The hardest thing for me to use playORM now is I don't know
> Cassandra well yet, and I know playORM even less. Can I ask playOrm
> questions in this list? I will try to create a POC here!
> Only now I am starting to understand what it does ;-) The examples
> directory is empty for now, I would like to see how to set up the
> connection with it.
>
> Cassandra spreads all your data out on all nodes with or without
> partitions.  A single partition does have it's data co-located though.
>
> Now I see. The main advantage of using partitions is keeping the indexes
> small enough. It has nothing to do with the nodes. Thanks!
>
> If you are at 100k(and the requests are rather small), you could embed al=
l
> the requests in the user or go with Aaron's below suggestion of a
> UserRequestsCF.  If your requests are rather large, you probably don't wa=
nt
> to embed them in the User.  Either way, it's one query or one row key
> lookup.
>
> I see it now.
>
> Multiget ignores partitions=85you feed it a LIST of keys and it gets them=
.
>  It just so happens that partitionId had to be part of your row key.
>
> Do you mean I need to load all the keys in memory to do a multiget?
>
> I have used Hector and now use Astyanax, I don't worry much about that
> layer, but I feed astyanax 3 nodes and I believe it discovers some of the
> other ones.  I believe the latter is true but am not 100% sure as I have
> not looked at that code.
>
> Why did you move? Hector is being considered for being the "official"
> client for Cassandra, isn't it? I looked at the Astyanax api and it seeme=
d
> much more high level though
>
> As an analogy on the above, if you happen to have used PlayOrm, you would
> ONLY need one Requests table and you partition by user AND time(two views
> into the same data partitioned two different ways) and you can do exactly
> the same thing as Aaron's example.  PlayOrm doesn't embed the partition i=
ds
> in the key leaving it free to partition twice like in your case=85.and in=
 a
> refactor, you have to map/reduce A LOT more rows because of rows having t=
he
> FK of <partitionid><subrowkey> whereas if you don't have partition id in
> the key, you only map/reduce the partitioned table in a redesign/refactor=
.
>  That said, we will be adding support for CQL partitioning in addition to
> PlayOrm partitioning even though it can be a little less flexible sometim=
es.
>
> I am not sure I understood this part. If I need to refactor, having the
> partition id in the key would be a bad thing? What would be the
> alternative? In my case, as I use userId : partitionId as row key, this
> might be a problem, right?
>
> Also, CQL locates all the data on one node for a partition.  We have foun=
d
> it can be faster "sometimes" with the parallelized disks that the
> partitions are NOT all on one node so PlayOrm partitions are virtual only
> and do not relate to where the rows are stored.  An example on our 6 node=
s
> was a join query on a partition with 1,000,000 rows took 60ms (of course =
I
> can't compare to CQL here since it doesn't do joins).  It really depends
> how much data is going to come back in the query though too?  There are
> tradeoff's between disk parallel nodes and having your data all on one no=
de
> of course.
>
> I guess I am still not ready for this level of info. :D
> In the playORM readme, we have the following:
>
>
> @NoSqlQuery(name=3D"findWithJoinQuery", query=3D"PARTITIONS t(:partId) SE=
LECT
> t FROM TABLE as t "+
> "INNER JOIN t.activityTypeInfo as i WHERE i.type =3D :type and t.numShare=
s <
> :shares"),
>
> What would happen behind the scenes when I execute this query? You can
> only use joins with partition keys, right?
> In this case, is partId the row id of TABLE CF?
>
>
> Thanks a lot for the answers
>
> --
> Marcelo Elias Del Valle
> http://mvalle.com - @mvallebr
>
>
>
> --
> Marcelo Elias Del Valle
> http://mvalle.com - @mvallebr
>
>
>
> --
> Marcelo Elias Del Valle
> http://mvalle.com - @mvallebr
>


--=20
Marcelo Elias Del Valle
http://mvalle.com - @mvallebr

--f46d04016c2358d56204ca8761fd
Content-Type: text/html; charset=windows-1252
Content-Transfer-Encoding: quoted-printable

Dean,=A0<div><br></div><div>=A0 =A0 In the playOrm data modeling, if I unde=
rstood it correctly, every CF has its own id, right? For instance, User wou=
ld have its own ID, Activities would have its own id, etc. What if I have a=
 trillion activities? Wouldn&#39;t be a problem to have 1 row id for each a=
ctivity?</div>
<div>=A0 =A0 =A0Cassandra always indexes by row id, right? If I have too ma=
ny row ids without using composite keys, will it scale the same way? Wouldn=
&#39;t the time to insert an activity be each time longer because I have to=
o many activities?</div>
<div><br></div><div>Best regards,</div><div>Marcelo Valle.<br><br><div clas=
s=3D"gmail_quote">2012/9/25 Hiller, Dean <span dir=3D"ltr">&lt;<a href=3D"m=
ailto:Dean.Hiller@nrel.gov" target=3D"_blank">Dean.Hiller@nrel.gov</a>&gt;<=
/span><br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">If you need anything added/fixed, just let P=
layOrm know. =A0PlayOrm has been able to quickly add so far=85that may chan=
ge as more and more requests come but so far PlayOrm seems to have managed =
to keep up.<br>

<br>
We are using it live by the way already. =A0It works out very well so far f=
or us (We have 5000 column families, obviously dynamically created instead =
of by hand=85a very interesting use case of cassandra). =A0In our live envi=
ronment we configured astyanax with LocalQUOROM on reads AND writes so CP s=
tyle so we can afford one node out of 3 to go down but if two go down it st=
ops working THOUGH there is a patch in astyanax to auto switch from LocalQU=
OROM to ONE NODE read/write when two nodes go down that we would like to su=
ck in eventually so it is always live(I don&#39;t think Hector has that and=
 it is a really NICE feature=85.ie fail localquorm read/write and then try =
again with consistency level of one).<br>

<div class=3D"im"><br>
Later,<br>
Dean<br>
<br>
<br>
From: Marcelo Elias Del Valle &lt;<a href=3D"mailto:mvallebr@gmail.com">mva=
llebr@gmail.com</a>&lt;mailto:<a href=3D"mailto:mvallebr@gmail.com">mvalleb=
r@gmail.com</a>&gt;&gt;<br>
Reply-To: &quot;<a href=3D"mailto:user@cassandra.apache.org">user@cassandra=
.apache.org</a>&lt;mailto:<a href=3D"mailto:user@cassandra.apache.org">user=
@cassandra.apache.org</a>&gt;&quot; &lt;<a href=3D"mailto:user@cassandra.ap=
ache.org">user@cassandra.apache.org</a>&lt;mailto:<a href=3D"mailto:user@ca=
ssandra.apache.org">user@cassandra.apache.org</a>&gt;&gt;<br>

</div>Date: Monday, September 24, 2012 1:54 PM<br>
<div class=3D"im">To: &quot;<a href=3D"mailto:user@cassandra.apache.org">us=
er@cassandra.apache.org</a>&lt;mailto:<a href=3D"mailto:user@cassandra.apac=
he.org">user@cassandra.apache.org</a>&gt;&quot; &lt;<a href=3D"mailto:user@=
cassandra.apache.org">user@cassandra.apache.org</a>&lt;mailto:<a href=3D"ma=
ilto:user@cassandra.apache.org">user@cassandra.apache.org</a>&gt;&gt;<br>

Subject: Re: Correct model<br>
<br>
</div><div class=3D"im">Dean, this sounds like magic :D<br>
I don&#39;t know details about the performance on the index implementations=
 you chose, but it would pay the way to use it in my case, as I don&#39;t n=
eed the best performance in the world when reading, but I need to assure sc=
alability and have a simple model to maintain. I liked the playOrm concept =
regarding this.<br>

I have more doubts, but I will ask them at stack over flow from now on.<br>
<br>
</div>2012/9/24 Hiller, Dean &lt;<a href=3D"mailto:Dean.Hiller@nrel.gov">De=
an.Hiller@nrel.gov</a>&lt;mailto:<a href=3D"mailto:Dean.Hiller@nrel.gov">De=
an.Hiller@nrel.gov</a>&gt;&gt;<br>
<div class=3D"im">PlayOrm will automatically create a CF to index my CF?<br=
>
<br>
It creates 3 CF&#39;s for all indices, IntegerIndice, DecimalIndice, and St=
ringIndice such that the ad-hoc tool that is in development can display the=
 indices as it knows the prefix of the composite column name is of Integer,=
 Decimal or String and it knows the postfix type as well so it can translat=
e back from bytes to the types and properly display in a GUI (i.e. On top o=
f SELECT, the ad-hoc tool is adding a way to view the induce rows so you ca=
n check if they got corrupt or not).<br>

<br>
Will it auto-manage it, like Cassandra&#39;s secondary indexes?<br>
<br>
YES<br>
<br>
Further detail=85<br>
<br>
You annotated fields with @NoSqlIndexed and PlayOrm adds/removes from the i=
ndex as you add/modify/remove the entity=85..a modify does a remove old val=
 from index and insert new value into index.<br>
<br>
An example would be PlayOrm stores all long, int, short, byte in a type tha=
t uses the least amount of space so IF you have a long OR BigInteger betwee=
n =96128 to 128 it only ends up storing 1 byte in cassandra(SAVING tons of =
space!!!). =A0Then if you are indexing a type that is one of those, PlayOrm=
 creates a IntegerIndice table.<br>

<br>
Right now, another guy is working on playorm-server which is a webgui to al=
low ad-hoc access to all your data as well so you can ad-hoc queries to see=
 data and instead of showing Hex, it shows the real values by translating t=
he bytes to String for the schema portions that it is aware of that is.<br>

<br>
Later,<br>
Dean<br>
<br>
</div><div class=3D"im">From: Marcelo Elias Del Valle &lt;<a href=3D"mailto=
:mvallebr@gmail.com">mvallebr@gmail.com</a>&lt;mailto:<a href=3D"mailto:mva=
llebr@gmail.com">mvallebr@gmail.com</a>&gt;&lt;mailto:<a href=3D"mailto:mva=
llebr@gmail.com">mvallebr@gmail.com</a>&lt;mailto:<a href=3D"mailto:mvalleb=
r@gmail.com">mvallebr@gmail.com</a>&gt;&gt;&gt;<br>

Reply-To: &quot;<a href=3D"mailto:user@cassandra.apache.org">user@cassandra=
.apache.org</a>&lt;mailto:<a href=3D"mailto:user@cassandra.apache.org">user=
@cassandra.apache.org</a>&gt;&lt;mailto:<a href=3D"mailto:user@cassandra.ap=
ache.org">user@cassandra.apache.org</a>&lt;mailto:<a href=3D"mailto:user@ca=
ssandra.apache.org">user@cassandra.apache.org</a>&gt;&gt;&quot; &lt;<a href=
=3D"mailto:user@cassandra.apache.org">user@cassandra.apache.org</a>&lt;mail=
to:<a href=3D"mailto:user@cassandra.apache.org">user@cassandra.apache.org</=
a>&gt;&lt;mailto:<a href=3D"mailto:user@cassandra.apache.org">user@cassandr=
a.apache.org</a>&lt;mailto:<a href=3D"mailto:user@cassandra.apache.org">use=
r@cassandra.apache.org</a>&gt;&gt;&gt;<br>

</div><div class=3D"im">Date: Monday, September 24, 2012 12:09 PM<br>
</div><div class=3D"im">To: &quot;<a href=3D"mailto:user@cassandra.apache.o=
rg">user@cassandra.apache.org</a>&lt;mailto:<a href=3D"mailto:user@cassandr=
a.apache.org">user@cassandra.apache.org</a>&gt;&lt;mailto:<a href=3D"mailto=
:user@cassandra.apache.org">user@cassandra.apache.org</a>&lt;mailto:<a href=
=3D"mailto:user@cassandra.apache.org">user@cassandra.apache.org</a>&gt;&gt;=
&quot; &lt;<a href=3D"mailto:user@cassandra.apache.org">user@cassandra.apac=
he.org</a>&lt;mailto:<a href=3D"mailto:user@cassandra.apache.org">user@cass=
andra.apache.org</a>&gt;&lt;mailto:<a href=3D"mailto:user@cassandra.apache.=
org">user@cassandra.apache.org</a>&lt;mailto:<a href=3D"mailto:user@cassand=
ra.apache.org">user@cassandra.apache.org</a>&gt;&gt;&gt;<br>

Subject: Re: Correct model<br>
<br>
</div><div class=3D"im">Dean,<br>
<br>
=A0 =A0 There is one last thing I would like to ask about playOrm by this l=
ist, the next questiosn will come by stackOverflow. Just because of the con=
text, I prefer asking this here:<br>
=A0 =A0 =A0When you say playOrm indexes a table (which would be a CF behind=
 the scenes), what do you mean? PlayOrm will automatically create a CF to i=
ndex my CF? Will it auto-manage it, like Cassandra&#39;s secondary indexes?=
<br>

=A0 =A0 =A0In Cassandra, the application is responsible for maintaining the=
 index, right? I might be wrong, but unless I am using secondary indexes I =
need to update index values manually, right?<br>
=A0 =A0 =A0I got confused when you said &quot;PlayOrm indexes the columns y=
ou choose&quot;. How do I choose and what exactly it means?<br>
<br>
Best regards,<br>
Marcelo Valle.<br>
<br>
</div>2012/9/24 Hiller, Dean &lt;<a href=3D"mailto:Dean.Hiller@nrel.gov">De=
an.Hiller@nrel.gov</a>&lt;mailto:<a href=3D"mailto:Dean.Hiller@nrel.gov">De=
an.Hiller@nrel.gov</a>&gt;&lt;mailto:<a href=3D"mailto:Dean.Hiller@nrel.gov=
">Dean.Hiller@nrel.gov</a>&lt;mailto:<a href=3D"mailto:Dean.Hiller@nrel.gov=
">Dean.Hiller@nrel.gov</a>&gt;&gt;&gt;<br>

<div><div class=3D"h5">Oh, ok, you were talking about the wide row pattern,=
 right?<br>
<br>
yes<br>
<br>
But playORM is compatible with Aaron&#39;s model, isn&#39;t it?<br>
<br>
Not yet, PlayOrm supports partitioning one table multiple ways as it indexe=
s the columns(in your case, the userid FK column and the time column)<br>
<br>
Can I map exactly this using playORM?<br>
<br>
Not yet, but the plan is to map these typical Cassandra scenarios as well.<=
br>
<br>
=A0Can I ask playOrm questions in this list?<br>
<br>
The best place to ask PlayOrm questions is on stack overflow and tag with P=
layOrm though I monitor this list and stack overflow for questions(there ar=
e already a few questions on stack overflow).<br>
<br>
The examples directory is empty for now, I would like to see how to set up =
the connection with it.<br>
<br>
Running build or build.bat is always kept working and all 62 tests pass(or =
we don&#39;t merge to master) so to see how to make a connection or run an =
example<br>
<br>
=A01. =A0Run build.bat or build which generates parsing code<br>
=A02. =A0Import into eclipse (it already has .classpath and .project for yo=
u already there)<br>
=A03. =A0In FactorySingleton.java you can modify IN_MEMORY to CASSANDRA or =
not and run any of the tests in-memory or against localhost(We run the test=
 suite also against a 6 node cluster as well and all passes)<br>
=A04. =A0FactorySingleton probably has the code you are looking for plus yo=
u need a class called nosql.Persistence or it won&#39;t scan your jar file.=
(class file not xml file like JPA)<br>
<br>
Do you mean I need to load all the keys in memory to do a multi get?<br>
<br>
No, you batch. =A0I am not sure about CQL, but PlayOrm returns a Cursor not=
 the results so you can loop through every key and behind the scenes it is =
doing batch requests so you can load up 100 keys and make one multi get req=
uest for those 100 keys and then can load up the next 100 keys, etc. etc. e=
tc. =A0I need to look more into the apis and protocol of CQL to see if it a=
llows this style of batching. =A0PlayOrm does support this style of batchin=
g today. =A0Aaron would know if CQL does.<br>

<br>
Why did you move? Hector is being considered for being the &quot;official&q=
uot; client for Cassandra, isn&#39;t it?<br>
<br>
At the time, I wanted the file streaming feature. =A0Also, Hector seemed a =
bit cumbersome as well compared to astyanax or at least if you were buildin=
g a platform and had no use for typing the columns. =A0Just personal prefer=
ence really here.<br>

<br>
I am not sure I understood this part. If I need to refactor, having the par=
tition id in the key would be a bad thing? What would be the alternative? I=
n my case, as I use userId : partitionId as row key, this might be a proble=
m, right?<br>

<br>
PlayOrm indexes the columns you choose(ie. The ones you want to use in the =
where clause) and partitions by columns you choose not based on the key so =
in PlayOrm, the key is typically a TimeUUID or something cluster unique=85.=
.any tables referencing that TimeUUID never have to change. =A0With Cassand=
ra partitioning, if you repartition that table a different way or go for so=
me kind of major change(usually done with map/reduce), all your foreign key=
s &quot;may&quot; have to change=85.it really depends on the situation thou=
gh. =A0Maybe you get the design right and never have to change.<br>

<br>
@NoSqlQuery(name=3D&quot;findWithJoinQuery&quot;, query=3D&quot;PARTITIONS =
t(:partId) SELECT t FROM TABLE as t &quot;+<br>
&quot;INNER JOIN t.activityTypeInfo as i WHERE i.type =3D :type and t.numSh=
ares &lt; :shares&quot;),<br>
<br>
What would happen behind the scenes when I execute this query?<br>
<br>
In this case, t or TABLE is a partitioned table since a partition is define=
d. =A0And t.activityTypeInfo refers to the ActivityTypeInfo table which is =
not partitioned(AND ActivityTypeInfo won&#39;t scale to billions of rows be=
cause there is no partitioning but maybe you don&#39;t need it!!!). =A0Behi=
nd the scenes when you call getResult, it returns a cursor that has NOT don=
e anything yet. =A0When you start looping through the cursor, behind the sc=
enes it is batching requests asking for next 500 matches(configurable) so y=
ou never run out of memory=85.it is EXACTLY like a database cursor. =A0You =
can even use the cursor to show a user the first set of results and when us=
er clicks next pick up right where the cursor left off (if you saved it to =
the HttpSession).<br>

<br>
You can only use joins with partition keys, right?<br>
<br>
Nope, joins work on anything. =A0You only need to specify the partitionId w=
hen you have a partitioned table in the list of join tables. (That is what =
the PARTITIONS clause is for, to identify partitionId =3D what?)=85it was p=
ut BEFORE the SQL instead of within it=85CQL took the opposite approach but=
 PlayOrm can also join different partitions together as well ;) ).<br>

<br>
In this case, is partId the row id of TABLE CF?<br>
<br>
Nope, partId is one of the columns. =A0There is a test case on this class i=
n PlayOrm =85(notice the annotation NoSqlPartitionByThisField on the column=
/field in the entity)=85<br>
<br>
<a href=3D"https://github.com/deanhiller/playorm/blob/master/input/javasrc/=
com/alvazan/test/db/PartitionedSingleTrade.java" target=3D"_blank">https://=
github.com/deanhiller/playorm/blob/master/input/javasrc/com/alvazan/test/db=
/PartitionedSingleTrade.java</a><br>

<br>
PlayOrm allows partitioned tables AND non-partioned tables(non-partitioned =
tables won&#39;t scale but maybe you will never have that many rows). =A0Yo=
u can join any two combinations(non-partitioned with partitioned, non-parti=
tioned with non-partitioned, partition with another partition).<br>

<br>
I only prefer stackoverflow as I like referencing links/questions with thei=
r urls. =A0To reference this email is very hard later on as I have to find =
it so in general, I HATE email lists ;) but it seems cassandra prefers them=
 so any questions on PlayOrm you can put there and I am not sure how many o=
n this may or may not be interested so it creates less noise on this list t=
oo.<br>

<br>
Later,<br>
Dean<br>
<br>
<br>
</div></div>From: Marcelo Elias Del Valle &lt;<a href=3D"mailto:mvallebr@gm=
ail.com">mvallebr@gmail.com</a>&lt;mailto:<a href=3D"mailto:mvallebr@gmail.=
com">mvallebr@gmail.com</a>&gt;&lt;mailto:<a href=3D"mailto:mvallebr@gmail.=
com">mvallebr@gmail.com</a>&lt;mailto:<a href=3D"mailto:mvallebr@gmail.com"=
>mvallebr@gmail.com</a>&gt;&gt;&lt;mailto:<a href=3D"mailto:mvallebr@gmail.=
com">mvallebr@gmail.com</a>&lt;mailto:<a href=3D"mailto:mvallebr@gmail.com"=
>mvallebr@gmail.com</a>&gt;&lt;mailto:<a href=3D"mailto:mvallebr@gmail.com"=
>mvallebr@gmail.com</a>&lt;mailto:<a href=3D"mailto:mvallebr@gmail.com">mva=
llebr@gmail.com</a>&gt;&gt;&gt;&gt;<br>

Reply-To: &quot;<a href=3D"mailto:user@cassandra.apache.org">user@cassandra=
.apache.org</a>&lt;mailto:<a href=3D"mailto:user@cassandra.apache.org">user=
@cassandra.apache.org</a>&gt;&lt;mailto:<a href=3D"mailto:user@cassandra.ap=
ache.org">user@cassandra.apache.org</a>&lt;mailto:<a href=3D"mailto:user@ca=
ssandra.apache.org">user@cassandra.apache.org</a>&gt;&gt;&lt;mailto:<a href=
=3D"mailto:user@cassandra.apache.org">user@cassandra.apache.org</a>&lt;mail=
to:<a href=3D"mailto:user@cassandra.apache.org">user@cassandra.apache.org</=
a>&gt;&lt;mailto:<a href=3D"mailto:user@cassandra.apache.org">user@cassandr=
a.apache.org</a>&lt;mailto:<a href=3D"mailto:user@cassandra.apache.org">use=
r@cassandra.apache.org</a>&gt;&gt;&gt;&quot; &lt;<a href=3D"mailto:user@cas=
sandra.apache.org">user@cassandra.apache.org</a>&lt;mailto:<a href=3D"mailt=
o:user@cassandra.apache.org">user@cassandra.apache.org</a>&gt;&lt;mailto:<a=
 href=3D"mailto:user@cassandra.apache.org">user@cassandra.apache.org</a>&lt=
;mailto:<a href=3D"mailto:user@cassandra.apache.org">user@cassandra.apache.=
org</a>&gt;&gt;&lt;mailto:<a href=3D"mailto:user@cassandra.apache.org">user=
@cassandra.apache.org</a>&lt;mailto:<a href=3D"mailto:user@cassandra.apache=
.org">user@cassandra.apache.org</a>&gt;&lt;mailto:<a href=3D"mailto:user@ca=
ssandra.apache.org">user@cassandra.apache.org</a>&lt;mailto:<a href=3D"mail=
to:user@cassandra.apache.org">user@cassandra.apache.org</a>&gt;&gt;&gt;&gt;=
<br>

<div class=3D"im">Date: Monday, September 24, 2012 11:07 AM<br>
</div>To: &quot;<a href=3D"mailto:user@cassandra.apache.org">user@cassandra=
.apache.org</a>&lt;mailto:<a href=3D"mailto:user@cassandra.apache.org">user=
@cassandra.apache.org</a>&gt;&lt;mailto:<a href=3D"mailto:user@cassandra.ap=
ache.org">user@cassandra.apache.org</a>&lt;mailto:<a href=3D"mailto:user@ca=
ssandra.apache.org">user@cassandra.apache.org</a>&gt;&gt;&lt;mailto:<a href=
=3D"mailto:user@cassandra.apache.org">user@cassandra.apache.org</a>&lt;mail=
to:<a href=3D"mailto:user@cassandra.apache.org">user@cassandra.apache.org</=
a>&gt;&lt;mailto:<a href=3D"mailto:user@cassandra.apache.org">user@cassandr=
a.apache.org</a>&lt;mailto:<a href=3D"mailto:user@cassandra.apache.org">use=
r@cassandra.apache.org</a>&gt;&gt;&gt;&quot; &lt;<a href=3D"mailto:user@cas=
sandra.apache.org">user@cassandra.apache.org</a>&lt;mailto:<a href=3D"mailt=
o:user@cassandra.apache.org">user@cassandra.apache.org</a>&gt;&lt;mailto:<a=
 href=3D"mailto:user@cassandra.apache.org">user@cassandra.apache.org</a>&lt=
;mailto:<a href=3D"mailto:user@cassandra.apache.org">user@cassandra.apache.=
org</a>&gt;&gt;&lt;mailto:<a href=3D"mailto:user@cassandra.apache.org">user=
@cassandra.apache.org</a>&lt;mailto:<a href=3D"mailto:user@cassandra.apache=
.org">user@cassandra.apache.org</a>&gt;&lt;mailto:<a href=3D"mailto:user@ca=
ssandra.apache.org">user@cassandra.apache.org</a>&lt;mailto:<a href=3D"mail=
to:user@cassandra.apache.org">user@cassandra.apache.org</a>&gt;&gt;&gt;&gt;=
<br>

Subject: Re: Correct model<br>
<br>
<br>
<br>
2012/9/24 Hiller, Dean &lt;<a href=3D"mailto:Dean.Hiller@nrel.gov">Dean.Hil=
ler@nrel.gov</a>&lt;mailto:<a href=3D"mailto:Dean.Hiller@nrel.gov">Dean.Hil=
ler@nrel.gov</a>&gt;&lt;mailto:<a href=3D"mailto:Dean.Hiller@nrel.gov">Dean=
.Hiller@nrel.gov</a>&lt;mailto:<a href=3D"mailto:Dean.Hiller@nrel.gov">Dean=
.Hiller@nrel.gov</a>&gt;&gt;&lt;mailto:<a href=3D"mailto:Dean.Hiller@nrel.g=
ov">Dean.Hiller@nrel.gov</a>&lt;mailto:<a href=3D"mailto:Dean.Hiller@nrel.g=
ov">Dean.Hiller@nrel.gov</a>&gt;&lt;mailto:<a href=3D"mailto:Dean.Hiller@nr=
el.gov">Dean.Hiller@nrel.gov</a>&lt;mailto:<a href=3D"mailto:Dean.Hiller@nr=
el.gov">Dean.Hiller@nrel.gov</a>&gt;&gt;&gt;&gt;<br>

<div class=3D"HOEnZb"><div class=3D"h5">I am confused. =A0In this email you=
 say you want &quot;get all requests for a user&quot; and in a previous one=
 you said &quot;Select all the users which has new requests, since date D&q=
uot; so let me answer both=85<br>

<br>
I have both needs. These are the two queries I need to perform on the model=
.<br>
<br>
For latter, you make ONE query into the latest partition(ONE partition) of =
the GlobalRequestsCF which gives you the most recent requests ALONG with th=
e user ids of those requests. =A0If you queried all partitions, you would m=
ost likely blow out your JVM memory.<br>

<br>
For the former, you make ONE query to the UserRequestsCF with userid =3D &l=
t;your user id&gt; to get all the requests for that user<br>
<br>
Now I think I got the main idea! This answered a lot!<br>
<br>
Sorry, I was skipping some context. =A0A lot of the backing indexing someti=
mes is done as a long row so in playOrm, too many rows in a partition means=
 =3D=3D too many columns in the indexing row for that partition. =A0I belie=
ve the same is true in cassandra for their indexing.<br>

<br>
Oh, ok, you were talking about the wide row pattern, right? But playORM is =
compatible with Aaron&#39;s model, isn&#39;t it? Can I map exactly this usi=
ng playORM? The hardest thing for me to use playORM now is I don&#39;t know=
 Cassandra well yet, and I know playORM even less. Can I ask playOrm questi=
ons in this list? I will try to create a POC here!<br>

Only now I am starting to understand what it does ;-) The examples director=
y is empty for now, I would like to see how to set up the connection with i=
t.<br>
<br>
Cassandra spreads all your data out on all nodes with or without partitions=
. =A0A single partition does have it&#39;s data co-located though.<br>
<br>
Now I see. The main advantage of using partitions is keeping the indexes sm=
all enough. It has nothing to do with the nodes. Thanks!<br>
<br>
If you are at 100k(and the requests are rather small), you could embed all =
the requests in the user or go with Aaron&#39;s below suggestion of a UserR=
equestsCF. =A0If your requests are rather large, you probably don&#39;t wan=
t to embed them in the User. =A0Either way, it&#39;s one query or one row k=
ey lookup.<br>

<br>
I see it now.<br>
<br>
Multiget ignores partitions=85you feed it a LIST of keys and it gets them. =
=A0It just so happens that partitionId had to be part of your row key.<br>
<br>
Do you mean I need to load all the keys in memory to do a multiget?<br>
<br>
I have used Hector and now use Astyanax, I don&#39;t worry much about that =
layer, but I feed astyanax 3 nodes and I believe it discovers some of the o=
ther ones. =A0I believe the latter is true but am not 100% sure as I have n=
ot looked at that code.<br>

<br>
Why did you move? Hector is being considered for being the &quot;official&q=
uot; client for Cassandra, isn&#39;t it? I looked at the Astyanax api and i=
t seemed much more high level though<br>
<br>
As an analogy on the above, if you happen to have used PlayOrm, you would O=
NLY need one Requests table and you partition by user AND time(two views in=
to the same data partitioned two different ways) and you can do exactly the=
 same thing as Aaron&#39;s example. =A0PlayOrm doesn&#39;t embed the partit=
ion ids in the key leaving it free to partition twice like in your case=85.=
and in a refactor, you have to map/reduce A LOT more rows because of rows h=
aving the FK of &lt;partitionid&gt;&lt;subrowkey&gt; whereas if you don&#39=
;t have partition id in the key, you only map/reduce the partitioned table =
in a redesign/refactor. =A0That said, we will be adding support for CQL par=
titioning in addition to PlayOrm partitioning even though it can be a littl=
e less flexible sometimes.<br>

<br>
I am not sure I understood this part. If I need to refactor, having the par=
tition id in the key would be a bad thing? What would be the alternative? I=
n my case, as I use userId : partitionId as row key, this might be a proble=
m, right?<br>

<br>
Also, CQL locates all the data on one node for a partition. =A0We have foun=
d it can be faster &quot;sometimes&quot; with the parallelized disks that t=
he partitions are NOT all on one node so PlayOrm partitions are virtual onl=
y and do not relate to where the rows are stored. =A0An example on our 6 no=
des was a join query on a partition with 1,000,000 rows took 60ms (of cours=
e I can&#39;t compare to CQL here since it doesn&#39;t do joins). =A0It rea=
lly depends how much data is going to come back in the query though too? =
=A0There are tradeoff&#39;s between disk parallel nodes and having your dat=
a all on one node of course.<br>

<br>
I guess I am still not ready for this level of info. :D<br>
In the playORM readme, we have the following:<br>
<br>
<br>
@NoSqlQuery(name=3D&quot;findWithJoinQuery&quot;, query=3D&quot;PARTITIONS =
t(:partId) SELECT t FROM TABLE as t &quot;+<br>
&quot;INNER JOIN t.activityTypeInfo as i WHERE i.type =3D :type and t.numSh=
ares &lt; :shares&quot;),<br>
<br>
What would happen behind the scenes when I execute this query? You can only=
 use joins with partition keys, right?<br>
In this case, is partId the row id of TABLE CF?<br>
<br>
<br>
Thanks a lot for the answers<br>
<br>
--<br>
Marcelo Elias Del Valle<br>
<a href=3D"http://mvalle.com" target=3D"_blank">http://mvalle.com</a> - @mv=
allebr<br>
<br>
<br>
<br>
--<br>
Marcelo Elias Del Valle<br>
<a href=3D"http://mvalle.com" target=3D"_blank">http://mvalle.com</a> - @mv=
allebr<br>
<br>
<br>
<br>
--<br>
Marcelo Elias Del Valle<br>
<a href=3D"http://mvalle.com" target=3D"_blank">http://mvalle.com</a> - @mv=
allebr<br>
</div></div></blockquote></div><br><br clear=3D"all"><div><br></div>-- <br>=
Marcelo Elias Del Valle<br><a href=3D"http://mvalle.com" target=3D"_blank">=
http://mvalle.com</a>=A0- @mvallebr<br>
</div>

--f46d04016c2358d56204ca8761fd--