Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id C9D28DA4A for ; Fri, 17 Aug 2012 13:30:47 +0000 (UTC) Received: (qmail 5199 invoked by uid 500); 17 Aug 2012 13:30:45 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 5180 invoked by uid 500); 17 Aug 2012 13:30:45 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 5172 invoked by uid 99); 17 Aug 2012 13:30:45 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 17 Aug 2012 13:30:45 +0000 X-ASF-Spam-Status: No, hits=-2.3 required=5.0 tests=RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [192.174.58.133] (HELO XEDGEB.nrel.gov) (192.174.58.133) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 17 Aug 2012 13:30:39 +0000 Received: from XHUBB.nrel.gov (10.20.4.59) by XEDGEB.nrel.gov (192.174.58.133) with Microsoft SMTP Server (TLS) id 8.3.245.1; Fri, 17 Aug 2012 07:30:10 -0600 Received: from MAILBOX2.nrel.gov ([fe80::19a0:6c19:6421:12f]) by XHUBB.nrel.gov ([::1]) with mapi; Fri, 17 Aug 2012 07:30:17 -0600 From: "Hiller, Dean" To: "user@cassandra.apache.org" Date: Fri, 17 Aug 2012 07:30:15 -0600 Subject: Re: indexing question related to playOrm on github Thread-Topic: indexing question related to playOrm on github Thread-Index: Ac18fHAiv75UxTXMTeascR00f748dQ== Message-ID: In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: user-agent: Microsoft-MacOutlook/14.2.3.120616 acceptlanguage: en-US Content-Type: text/plain; charset="iso-8859-2" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 I am not sure what you mean by play with the timestamp. I think this works= without playing with the timestamp(thanks for you help as it got me here). 1. On a scan I hit 2. I end up looking up the pk 3. I compare the value in the row with the indexed value "mike" but I see= the row with that pk has Sam not Mike 4. I now know I can discard this result as a false positive. I also know= my index has duplicates. 5. I kick off a job to scan the complete index now AND read in each pk ro= w of the index comparing indexed value with the actual value in the row to = fix the index. I think that might work pretty well. Thanks, Dean From: aaron morton = > Reply-To: "user@cassandra.apache.org" > Date: Thursday, August 16, 2012 4:55 PM To: "user@cassandra.apache.org" > Subject: Re: indexing question related to playOrm on github I am not sure synchronization fixes that=A9=A9It would be kind of nice if the column <65> would not actually be removed until after all servers are eventually consistent... Not sure thats possible. You can either serialise updating your custom secondary index on the client= site or resolve the inconsistency on read. Not sure this fits with your workload but as an e.g. when you read from the= index, if you detect multiple row PK's resolve the issue on the client and= leave the data in cassandra as is. Then queue a job that will read the row= and try to repair it's index entries. When repairing the index entry play = with the timestamp so any deletions you make only apply to the column as it= was when you saw the error. Hope that helps. ----------------- Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 17/08/2012, at 12:47 AM, "Hiller, Dean" > wrote: Maybe this would be a special type of column family that could contain these as my other tables definitely don't want the feature below by the way. Dean On 8/16/12 6:29 AM, "Hiller, Dean" > wrote: Yes, the synch may work, and no, I do "not" want a transaction=A9I want a different kind of eventually consistent That might work. Let's say server 1 sends a mutation (65 is the pk) Remove: <65> Add <65> Server 2 also sends a mutation (65 is the pk) Remove: <65> Add <65> What everyone does not want is to end up with a row that has <65> and <65>. With the wide row pattern, we would like to have ONE or the other. I am not sure synchronization fixes that=A9=A9It would be kind = of nice if the column <65> would not actually be removed until after all servers are eventually consistent AND would keep a reference to the add that was happening so that when it goes to resolve eventually consistent between the servers, it would see that <65> is newer and it would decide to drop the first add completely. Ie. In a full process it might look like this Cassandra node 1 receives remove <65>, add <65> AND in the remove column stores info about the add <65> until eventual consistency is completed Cassandra node 2 one ms later receives remove <65> and <65> AND in the remove column stores info about the add <65> until eventual consistency is completed Eventual consistency starts comparing node 1 and node 2 and finds <65> is being removed by different servers and finds add info attached to that. ONLY THE LAST add info is acknowledged and it makes the row consistent across the cluster. That makes everyone's wide row indexing pattern tend to get less corrupt over time. Thanks, Dean From: aaron morton > Reply-To: "user@cassandra.apache.org" > Date: Wednesday, August 15, 2012 8:26 PM To: "user@cassandra.apache.org" > Subject: Re: indexing question related to playOrm on github 1. Can playOrm be listed on cassandra's list of ORMs? It supports a JQL/HQL query on a trillion rows in under 100ms (partitioning is the trick so you can JQL a partition) No sure if we have an ORM specific page. If it's a client then feel free to add it to http://wiki.apache.org/cassandra/ClientOptions I was wondering if cassandra has or will ever support eventual constancy where it keeps both the REMOVE AND the ADD together such until it is on all 3 replicated nodes and in resolving the consistency would end up with an index that only has the very last one in the index. Not sure I fully understand but it sounds like you want a transaction, which is not going to happen. Internally when Cassandra updates a secondary index it does the same thing. But it synchronises updates around the same row so one thread will apply the changes at a time. Hope that helps. ----------------- Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 16/08/2012, at 12:34 PM, "Hiller, Dean" > wrote: 1. Can playOrm be listed on cassandra's list of ORMs? It supports a JQL/HQL query on a trillion rows in under 100ms (partitioning is the trick so you can JQL a partition) 2. Many applications have a common indexing problem and I was wondering if cassandra has or could have any support for this in the future=A9. When using wide row indexes, you frequently have . as the composite key. This means when you have your object like so in the database Activity { pk: 65 name: bill } And then two servers want to save it as Activity { pk:65 name:tim } Activity { pk:65 name:mike } Each server will remove <65> and BOTH servers will add <65> AND <65> BUT one of them will really be a lie!!!!! I was wondering if cassandra has or will ever support eventual constancy where it keeps both the REMOVE AND the ADD together such until it is on all 3 replicated nodes and in resolving the consistency would end up with an index that only has the very last one in the index. Thanks, Dean