cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Guy Incognito <dnd1...@gmail.com>
Subject Re: indexes from CassandraSF
Date Mon, 14 Nov 2011 08:04:07 GMT
ok great, thanks ed, that's really helpful.

just wanted to make sure i wasn't missing something fundamental.

On 13/11/2011 23:57, Ed Anuff wrote:
> Yes, correct, it's not going to clean itself.  Using your example with
> a little more detail:
>
> 1 ) A(T1) reads previous location (T0,L0) from index_entries for user U0
> 2 ) B(T2) reads previous location (T0,L0) from index_entries for user U0
> 3 ) A(T1) deletes previous location (T0,L0) from index_entries for user U0
> 4 ) B(T2) deletes previous location (T0,L0) from index_entries for user U0
> 5 ) A(T1) deletes previous location (L0,T0,U0) for user U0 from index
> 6 ) B(T2) deletes previous location (L0,T0,U0) for user U0 from index
> 7 ) A(T1) inserts new location (T1,L1) into index_entries for user U0
> 8 ) B(T2) inserts new location (T2,L2) into index_entries for user U0
> 9 ) index_entries for user U0 now contains (T1,L1),(T2,L2)
> 10) A(T1) inserts new location (L1,T1,U0) for user U0 into index
> 11) B(T2) inserts new location (L2,T2,U0) for user U0 into index
> 12) A(T1) sets new location (L1) on user U0
> 13) B(T2) sets new location (L2) on user U0
> 14) C(T3) queries for users where location equals L1, gets back user
> U0 where current location is actually L2
>
> So, you want to either verify on read by making sure the queried field
> is correct before returning it in your result set to the rest of your
> app, or you want to use locking (ex. lock on (U0,"location") during
> updates).  The key thing here is that although the index is not in the
> desired state at (14), the information is in the system to get to that
> state (the previous values in index_entries).  This lets the cleanup
> happen on the next update of location for user U0:
>
> 15) D(T4) reads previous locations (T1,L1),(T2,L2) from index entries
> for user U0
> 16) D(T4) deletes previous locations (T1,L1),(T2,L2) from index
> entries for user U0
> 17) D(T4) deletes previous locations (L1,T1,U0),(L2,T2,U0) for user U0
> from index
> 18) D(T4) inserts new location (T4,L3) into index entries for user U0
> 19) D(T4) inserts new location (L3,T4,U0) for user U0 into index
> 20) D(T4) sets new location (L3) on user U0
>
> BTW, just to reiterate since this sometimes comes up, the timestamps
> being stored in these tuples are not longs, they're time UUIDs, so T1
> and T2 are never equal.
>
> Ed
>
>
> On Sun, Nov 13, 2011 at 6:52 AM, Guy Incognito<dnd1066@gmail.com>  wrote:
>> [1] i'm not particularly worried about transient conditions so that's ok.  i
>> think there's still the possibility of a non-transient false positive...if 2
>> writes were to happen at exactly the same time (highly unlikely), eg
>>
>> 1) A reads previous location (L1) from index entries
>> 2) B reads previous location (L1) from index entries
>> 3) A deletes previous location (L1) from index entries
>> 4) B deletes previous location (L1) from index entries
>> 5) A deletes previous location (L1) from index
>> 6) B deletes previous location (L1) from index
>> 7) A enters new location (L2) into index entries
>> 8) B enters new location (L3) into index entries
>> 9 ) A enters new location (L2) into index
>> 10) B enters new location (L3) into index
>> 11) A sets new location (L2) on users
>> 12) B sets new location (L2) on users
>>
>> after this, don't i end up with an incorrect L2 location in index entries
>> and in the index, that won't be resolved until the next write of location
>> for that user?
>>
>> [2] ah i see...so the client would continuously retry until the update
>> works.  that's fine provided the client doesn't bomb out with some other
>> error, if that were to happen then i have potentially deleted the index
>> entry columns without deleting the corresponding index columns.
>>
>> i can handle both of the above for my use case, i just want to clarify
>> whether they are possible (however unlikely) scenarios.
>>
>> On 13/11/2011 02:41, Ed Anuff wrote:
>>> 1) The index updates should be eventually consistent.  This does mean
>>> that you can get a transient false-positive on your search results.
>>> If this doesn't work for you, then you either need to use ZK or some
>>> other locking solution or do "read repair" by making sure that the row
>>> you retrieve contains the value you're searching for before passing it
>>> on to the rest of your applicaiton.
>>>
>>> 2)  You should be able to reapply the batch updates til they succeed.
>>> The update is idempotent.  One thing that's important that the slides
>>> don't make clear is that this requires using time-based uuids as your
>>> timestamp components.  Take a look at the sample code.
>>>
>>> Hope this helps,
>>>
>>> Ed
>>>
>>> On Sat, Nov 12, 2011 at 3:59 PM, Guy Incognito<dnd1066@gmail.com>    wrote:
>>>> help?
>>>>
>>>> On 10/11/2011 19:34, Guy Incognito wrote:
>>>>> hi,
>>>>>
>>>>> i've been looking at the model below from Ed Anuff's presentation at
>>>>> Cassandra CF (http://www.slideshare.net/edanuff/indexing-in-cassandra).
>>>>>   Couple of questions:
>>>>>
>>>>> 1) Isn't there still the chance that two concurrent updates may end up
>>>>> with the index containing two entries for the given user, only one of
>>>>> which
>>>>> would be match the actual value in the Users cf?
>>>>>
>>>>> 2) What happens if your batch fails partway through the update?  If i
>>>>> understand correctly there are no guarantees about ordering when a batch
>>>>> is
>>>>> executed, so isn't it possible that eg the previous
>>>>> value entries in Users_Index_Entries may have been deleted, and then
the
>>>>> batch fails before the entries in Indexes are deleted, ie the mechanism
>>>>> has
>>>>> 'lost' those values?  I assume this can be addressed
>>>>> by not deleting the old entries until the batch has succeeded (ie put
>>>>> the
>>>>> previous entry deletion into a separate, subsequent batch).  this at
>>>>> least
>>>>> lets you retry at a later time.
>>>>>
>>>>> perhaps i'm missing something?
>>>>>
>>>>> SELECT {"location"}..{"location", *}
>>>>> FROM Users_Index_Entries WHERE KEY =<user_key>;
>>>>>
>>>>> BEGIN BATCH
>>>>>
>>>>> DELETE {"location", ts1}, {"location", ts2}, ...
>>>>> FROM Users_Index_Entries WHERE KEY =<user_key>;
>>>>>
>>>>> DELETE {<value1>,<user_key>, ts1}, {<value2>,<user_key>,
ts2}, ...
>>>>> FROM Indexes WHERE KEY = "Users_By_Location";
>>>>>
>>>>> UPDATE Users_Index_Entries SET {"location", ts3} =<value3>
>>>>> WHERE KEY=<user_key>;
>>>>>
>>>>> UPDATE Indexes SET {<value3>,<user_key>, ts3) = null
>>>>> WHERE KEY = "Users_By_Location";
>>>>>
>>>>> UPDATE Users SET location =<value3>
>>>>> WHERE KEY =<user_key>;
>>>>>
>>>>> APPLY BATCH
>>>>>
>>


Mime
View raw message