accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Keith Turner (JIRA)" <>
Subject [jira] [Commented] (ACCUMULO-2204) Add delete client to continuous ingest
Date Thu, 16 Jan 2014 16:54:19 GMT


Keith Turner commented on ACCUMULO-2204:

Scratch my previous suggestion about creating a circle. I do not think there is a good order
to do the writes that such that the client can be killed.

Thinking about this made me realize my assumptions initial about collision probabilities were
wrong.   I thought through it again and determined that there is not really a problem, but
for different reasons.  I was thinking that only a delete client and ingest client picking
the same random row at around the same time mattered .  However time is not a factor.

Below is an example of the problem I thought of.

 # ingest client A writes 7:A->5:A->29:A->13:A->19:A
 # ingest client B writes 83:B->37:B->29:B->97:B
 # delete client C writes 5:A->19:A and deletes 29:B and 13:A

I was thinking this would cause 37:B to point to a non-existant node (note the ingest id is
in the value and not part of the key).   However I completely I forgot about columns.  The
row is 63 bit random by default, but I forgot that there is 15 bits in the family and 15 bits
in the qualifier.  So even if A and B choose the same row there is only a 1 in 2^30 chance
that they choose the same column.  Therefore its important that the delete client only follows
the same ingest client.  Below is the example above including columns.

 # ingest client A writes 7:1:2:A->5,  5:3:4:A->29, 29:5:6:A->13, 13:7:8:A->19,
 # ingest client B writes 83:11:12:B->37, 37:13:14:B->29, 29:15:16:B->97, 97:17:18:B->null
 # delete client C writes 5:3:4:A->19 and deletes  29:5:6 and 13:7:8

So 29:15:16 written by B still exist and there is no problem.  

tl;dr  The delete client should only delete row+column created by the same ingest client.
 The delete client should not create a circular linked list, unless there is a fault tolerant
way to do it..


> Add delete client to continuous ingest
> --------------------------------------
>                 Key: ACCUMULO-2204
>                 URL:
>             Project: Accumulo
>          Issue Type: Sub-task
>            Reporter: Keith Turner
> Adding the linked list operation of deleting nodes would detect deleted data coming back.
Could create something similar to the walker that does the following.
>  # selects a random node X
>  # follows the linked list for a random number of times and stops at node Y
>  # makes X point Y
>  # deletes all nodes that were between X and Y in the list
> For example given the following linked list 
> {noformat}
>    7->5->29->13->19->23->17
> {noformat}
> If 5 were picked as the first node and 23 as the last node, then the following operations
would be done.
>  # write 5->23
>  # flush
>  # delete 29
>  # flush
>  # delete 13
>  # flush
>  # delete 19
>  # flush
>  # do batch read and/or scan to verify deletes 
> If 29 or 13 should come back, then the nodes they point to would not exist and verification
would catch this.  I think the operations above are done in such a way that the delete client
could be killed at any time.
> Since continuous ingest works w/ random number there is a small chance that the delete
client could delete a node just written by another client.  With 63 bit random numbers this
chance is exceedingly small.  Should it occur the person debugging should be able to sort
it out when looking at the  write ahead logs.  Therefore I do not think its worthwhile taking
any action in the test.

This message was sent by Atlassian JIRA

View raw message