Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 5E5ABD110 for ; Mon, 24 Sep 2012 15:24:41 +0000 (UTC) Received: (qmail 57944 invoked by uid 500); 24 Sep 2012 15:24:39 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 57898 invoked by uid 500); 24 Sep 2012 15:24:38 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 57888 invoked by uid 99); 24 Sep 2012 15:24:38 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 24 Sep 2012 15:24:38 +0000 X-ASF-Spam-Status: No, hits=-2.3 required=5.0 tests=RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [192.174.58.133] (HELO XEDGEB.nrel.gov) (192.174.58.133) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 24 Sep 2012 15:24:34 +0000 Received: from XHUBB.nrel.gov (10.20.4.59) by XEDGEB.nrel.gov (192.174.58.133) with Microsoft SMTP Server (TLS) id 8.3.245.1; Mon, 24 Sep 2012 09:24:12 -0600 Received: from MAILBOX2.nrel.gov ([fe80::19a0:6c19:6421:12f]) by XHUBB.nrel.gov ([::1]) with mapi; Mon, 24 Sep 2012 09:24:12 -0600 From: "Hiller, Dean" To: "user@cassandra.apache.org" Date: Mon, 24 Sep 2012 09:24:13 -0600 Subject: Re: Correct model Thread-Topic: Correct model Thread-Index: Ac2aaKaEkK8dWMvJTj6kMNUmoVajIw== Message-ID: In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: user-agent: Microsoft-MacOutlook/14.2.3.120616 acceptlanguage: en-US Content-Type: text/plain; charset="Windows-1252" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Virus-Checked: Checked by ClamAV on apache.org I am confused. In this email you say you want "get all requests for a user= " and in a previous one you said "Select all the users which has new reques= ts, since date D" so let me answer both=85 For latter, you make ONE query into the latest partition(ONE partition) of = the GlobalRequestsCF which gives you the most recent requests ALONG with th= e user ids of those requests. If you queried all partitions, you would mos= t likely blow out your JVM memory. For the former, you make ONE query to the UserRequestsCF with userid =3D to get all the requests for that user You mean too many rows, not a row too long, right? I am assuming each reque= st will be a different row, not a new column. Is having billions of ROWS so= mething non performatic in Cassandra? I know Cassandra allows up to 2 billi= on columns for a CF, but I am not aware of a limitation for rows=85 Sorry, I was skipping some context. A lot of the backing indexing sometime= s is done as a long row so in playOrm, too many rows in a partition means = =3D=3D too many columns in the indexing row for that partition. I believe = the same is true in cassandra for their indexing. If I understood it correctly, if I don't specify partitions, Cassandra will= store all my data in a single node? Cassandra spreads all your data out on all nodes with or without partitions= . A single partition does have it's data co-located though. I 99,999% of my users will have less than 100k requests, would it make sens= e to partition by user? If you are at 100k(and the requests are rather small), you could embed all = the requests in the user or go with Aaron's below suggestion of a UserReque= stsCF. If your requests are rather large, you probably don't want to embed= them in the User. Either way, it's one query or one row key lookup. That's cool! :D So if I need to query data split in 10 partitions, for inst= ance, I can perform the query in parallel by using a multiget, right? Multiget ignores partitions=85you feed it a LIST of keys and it gets them. = It just so happens that partitionId had to be part of your row key. Out of curiosity, if each get will occur on a different node, I would need = to connect to each of the nodes? Or would I query 1 node and it would commu= nicate to others? I have used Hector and now use Astyanax, I don't worry much about that laye= r, but I feed astyanax 3 nodes and I believe it discovers some of the other= ones. I believe the latter is true but am not 100% sure as I have not loo= ked at that code. As an analogy on the above, if you happen to have used PlayOrm, you would O= NLY need one Requests table and you partition by user AND time(two views in= to the same data partitioned two different ways) and you can do exactly the= same thing as Aaron's example. PlayOrm doesn't embed the partition ids in= the key leaving it free to partition twice like in your case=85.and in a r= efactor, you have to map/reduce A LOT more rows because of rows having the = FK of whereas if you don't have partition id in th= e key, you only map/reduce the partitioned table in a redesign/refactor. T= hat said, we will be adding support for CQL partitioning in addition to Pla= yOrm partitioning even though it can be a little less flexible sometimes. Also, CQL locates all the data on one node for a partition. We have found = it can be faster "sometimes" with the parallelized disks that the partition= s are NOT all on one node so PlayOrm partitions are virtual only and do not= relate to where the rows are stored. An example on our 6 nodes was a join= query on a partition with 1,000,000 rows took 60ms (of course I can't comp= are to CQL here since it doesn't do joins). It really depends how much dat= a is going to come back in the query though too? There are tradeoff's betw= een disk parallel nodes and having your data all on one node of course. Later, Dean From: Marcelo Elias Del Valle > Reply-To: "user@cassandra.apache.org" > Date: Monday, September 24, 2012 7:45 AM To: "user@cassandra.apache.org" > Subject: Re: Correct model 2012/9/23 Hiller, Dean > You need to split data among partitions or your query won't scale as more a= nd more data is added to table. Having the partition means you are queryin= g a lot less rows. This will happen in case I can query just one partition. But if I need to q= uery things in multiple partitions, wouldn't it be slower? He means determine the ONE partition key and query that partition. Ie. If = you want just latest user requests, figure out the partition key based on w= hich month you are in and query it. If you want the latest independent of = user, query the correct single partition for GlobalRequests CF. But in this case, I didn't understand Aaron's model then. My first query is= to get all requests for a user. If I did partitions by time, I will need = to query all partitions to get the results, right? In his answer it was sai= d I would query ONE partition... If I want all the requests for the user, couldn't I just select all UserReq= uest records which start with "userId"? He designed it so the user requests table was completely scalable so he has= partitions there. If you don't have partitions, you could run into a row = that is toooo long. You don't need to design it this way if you know none = of your users are going to go into the millions as far as number of request= s. In his design then, you need to pick the correct partition and query in= to that partition. You mean too many rows, not a row too long, right? I am assuming each reque= st will be a different row, not a new column. Is having billions of ROWS so= mething non performatic in Cassandra? I know Cassandra allows up to 2 billi= on columns for a CF, but I am not aware of a limitation for rows... I really didn't understand why to use partitions. Partitions are a way if you know your rows will go into the trillions of br= eaking them up so each partition has 100k rows or so or even 1 million but = maxes out in the millions most likely. Without partitions, you hit a limit= in the millions. With partitions, you can keep scaling past that as you c= an have as many partitions as you want. If I understood it correctly, if I don't specify partitions, Cassandra will= store all my data in a single node? I thought Cassandra would automaticall= y distribute my data among nodes as I insert rows into a CF. Of course if I= use partitions I understand I could query just one partition (node) to get= the data, if I have the partition field, but to the best of my knowledge, = this is not what happens in my case, right? In the first query I would have= to query all the partitions... Or you are saying partitions have nothing to do with nodes?? I 99,999% of m= y users will have less than 100k requests, would it make sense to partition= by user? A multi-get is a query that finds IN PARALLEL all the rows with the matchin= g keys you send to cassandra. If you do 1000 gets(instead of a multi-get) = with 1ms latency, you will find, it takes 1 second+processing time. If you= do ONE multi-get, you only have 1 request and therefore 1ms latency. The = other solution is you could send 1000 "asycnh" gets but I have a feeling th= at would be slower with all the marshalling/unmarshalling of the envelope= =85..really depends on the envelope size like if we were using http, you wo= uld get killed doing 1000 requests instead of 1 with 1000 keys in it. That's cool! :D So if I need to query data split in 10 partitions, for inst= ance, I can perform the query in parallel by using a multiget, right? Out o= f curiosity, if each get will occur on a different node, I would need to co= nnect to each of the nodes? Or would I query 1 node and it would communicat= e to others? Later, Dean From: Marcelo Elias Del Valle >> Reply-To: "user@cassandra.apache.org>" >> Date: Sunday, September 23, 2012 10:23 AM To: "user@cassandra.apache.org>" >> Subject: Re: Correct model 2012/9/20 aaron morton >> I would consider: # User CF * row_key: user_id * columns: user properties, key=3Dvalue # UserRequests CF * row_key: where partition_start is the start o= f a time partition that makes sense in your domain. e.g. partition monthly.= Generally want to avoid rows the grow forever, as a rule of thumb avoid ro= ws more than a few 10's of MB. * columns: two possible approaches: 1) If the requests are immutable and you generally want all of the data sto= re the request in a single column using JSON or similar, with the column na= me a timestamp. 2) Otherwise use a composite column name of = to store the request in many columns. * In either case consider using Reversed comparators so the most recent col= umns are first see http://thelastpickle.com/2011/10/03/Reverse-Comparators= / # GlobalRequests CF * row_key: partition_start - time partition as above. It may be easier to u= se the same partition scheme. * column name: * column value: empty Ok, I think I understood your suggestion... But the only advantage in this = solution is to split data among partitions? I understood how it would work,= but I didn't understand why it's better than the other solution, without t= he GlobalRequests CF - Select all the requests for an user Work out the current partition client side, get the first N columns. Then p= age. What do you mean here by current partition? You mean I would perform a quer= y for each particition? If I want all the requests for the user, couldn't I= just select all UserRequest records which start with "userId"? I might be = missing something here, but in my understanding if I use hector to query a = column familly I can do that and Cassandra servers will automatically commu= nicate to each other to get the data I need, right? Is it bad? I really did= n't understand why to use partitions. - Select all the users which has new requests, since date D Worm out the current partition client side, get the first N columns from Gl= obalRequests, make a multi get call to UserRequests NOTE: Assuming the size of the global requests space is not huge. Hope that helps. For sure it is helping a lot. However, I don't know what is a multiget... = I saw the hector api reference and found this method, but not sure about wh= at Cassandra would do internally if I do a multiget... Is this expensive in= terms of performance and latency? -- Marcelo Elias Del Valle http://mvalle.com - @mvallebr -- Marcelo Elias Del Valle http://mvalle.com - @mvallebr