incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ed Anuff ...@anuff.com>
Subject Re: indexes from CassandraSF
Date Sun, 13 Nov 2011 23:57:25 GMT
Yes, correct, it's not going to clean itself.  Using your example with
a little more detail:

1 ) A(T1) reads previous location (T0,L0) from index_entries for user U0
2 ) B(T2) reads previous location (T0,L0) from index_entries for user U0
3 ) A(T1) deletes previous location (T0,L0) from index_entries for user U0
4 ) B(T2) deletes previous location (T0,L0) from index_entries for user U0
5 ) A(T1) deletes previous location (L0,T0,U0) for user U0 from index
6 ) B(T2) deletes previous location (L0,T0,U0) for user U0 from index
7 ) A(T1) inserts new location (T1,L1) into index_entries for user U0
8 ) B(T2) inserts new location (T2,L2) into index_entries for user U0
9 ) index_entries for user U0 now contains (T1,L1),(T2,L2)
10) A(T1) inserts new location (L1,T1,U0) for user U0 into index
11) B(T2) inserts new location (L2,T2,U0) for user U0 into index
12) A(T1) sets new location (L1) on user U0
13) B(T2) sets new location (L2) on user U0
14) C(T3) queries for users where location equals L1, gets back user
U0 where current location is actually L2

So, you want to either verify on read by making sure the queried field
is correct before returning it in your result set to the rest of your
app, or you want to use locking (ex. lock on (U0,"location") during
updates).  The key thing here is that although the index is not in the
desired state at (14), the information is in the system to get to that
state (the previous values in index_entries).  This lets the cleanup
happen on the next update of location for user U0:

15) D(T4) reads previous locations (T1,L1),(T2,L2) from index entries
for user U0
16) D(T4) deletes previous locations (T1,L1),(T2,L2) from index
entries for user U0
17) D(T4) deletes previous locations (L1,T1,U0),(L2,T2,U0) for user U0
from index
18) D(T4) inserts new location (T4,L3) into index entries for user U0
19) D(T4) inserts new location (L3,T4,U0) for user U0 into index
20) D(T4) sets new location (L3) on user U0

BTW, just to reiterate since this sometimes comes up, the timestamps
being stored in these tuples are not longs, they're time UUIDs, so T1
and T2 are never equal.

Ed


On Sun, Nov 13, 2011 at 6:52 AM, Guy Incognito <dnd1066@gmail.com> wrote:
> [1] i'm not particularly worried about transient conditions so that's ok.  i
> think there's still the possibility of a non-transient false positive...if 2
> writes were to happen at exactly the same time (highly unlikely), eg
>
> 1) A reads previous location (L1) from index entries
> 2) B reads previous location (L1) from index entries
> 3) A deletes previous location (L1) from index entries
> 4) B deletes previous location (L1) from index entries
> 5) A deletes previous location (L1) from index
> 6) B deletes previous location (L1) from index
> 7) A enters new location (L2) into index entries
> 8) B enters new location (L3) into index entries
> 9 ) A enters new location (L2) into index
> 10) B enters new location (L3) into index
> 11) A sets new location (L2) on users
> 12) B sets new location (L2) on users
>
> after this, don't i end up with an incorrect L2 location in index entries
> and in the index, that won't be resolved until the next write of location
> for that user?
>
> [2] ah i see...so the client would continuously retry until the update
> works.  that's fine provided the client doesn't bomb out with some other
> error, if that were to happen then i have potentially deleted the index
> entry columns without deleting the corresponding index columns.
>
> i can handle both of the above for my use case, i just want to clarify
> whether they are possible (however unlikely) scenarios.
>
> On 13/11/2011 02:41, Ed Anuff wrote:
>>
>> 1) The index updates should be eventually consistent.  This does mean
>> that you can get a transient false-positive on your search results.
>> If this doesn't work for you, then you either need to use ZK or some
>> other locking solution or do "read repair" by making sure that the row
>> you retrieve contains the value you're searching for before passing it
>> on to the rest of your applicaiton.
>>
>> 2)  You should be able to reapply the batch updates til they succeed.
>> The update is idempotent.  One thing that's important that the slides
>> don't make clear is that this requires using time-based uuids as your
>> timestamp components.  Take a look at the sample code.
>>
>> Hope this helps,
>>
>> Ed
>>
>> On Sat, Nov 12, 2011 at 3:59 PM, Guy Incognito<dnd1066@gmail.com>  wrote:
>>>
>>> help?
>>>
>>> On 10/11/2011 19:34, Guy Incognito wrote:
>>>>
>>>> hi,
>>>>
>>>> i've been looking at the model below from Ed Anuff's presentation at
>>>> Cassandra CF (http://www.slideshare.net/edanuff/indexing-in-cassandra).
>>>>  Couple of questions:
>>>>
>>>> 1) Isn't there still the chance that two concurrent updates may end up
>>>> with the index containing two entries for the given user, only one of
>>>> which
>>>> would be match the actual value in the Users cf?
>>>>
>>>> 2) What happens if your batch fails partway through the update?  If i
>>>> understand correctly there are no guarantees about ordering when a batch
>>>> is
>>>> executed, so isn't it possible that eg the previous
>>>> value entries in Users_Index_Entries may have been deleted, and then the
>>>> batch fails before the entries in Indexes are deleted, ie the mechanism
>>>> has
>>>> 'lost' those values?  I assume this can be addressed
>>>> by not deleting the old entries until the batch has succeeded (ie put
>>>> the
>>>> previous entry deletion into a separate, subsequent batch).  this at
>>>> least
>>>> lets you retry at a later time.
>>>>
>>>> perhaps i'm missing something?
>>>>
>>>> SELECT {"location"}..{"location", *}
>>>> FROM Users_Index_Entries WHERE KEY =<user_key>;
>>>>
>>>> BEGIN BATCH
>>>>
>>>> DELETE {"location", ts1}, {"location", ts2}, ...
>>>> FROM Users_Index_Entries WHERE KEY =<user_key>;
>>>>
>>>> DELETE {<value1>,<user_key>, ts1}, {<value2>,<user_key>,
ts2}, ...
>>>> FROM Indexes WHERE KEY = "Users_By_Location";
>>>>
>>>> UPDATE Users_Index_Entries SET {"location", ts3} =<value3>
>>>> WHERE KEY=<user_key>;
>>>>
>>>> UPDATE Indexes SET {<value3>,<user_key>, ts3) = null
>>>> WHERE KEY = "Users_By_Location";
>>>>
>>>> UPDATE Users SET location =<value3>
>>>> WHERE KEY =<user_key>;
>>>>
>>>> APPLY BATCH
>>>>
>>>
>
>

Mime
View raw message