accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Keith Turner (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ACCUMULO-2205) Add compaction filter to continuous ingest
Date Thu, 16 Jan 2014 17:52:21 GMT

    [ https://issues.apache.org/jira/browse/ACCUMULO-2205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13873662#comment-13873662
] 

Keith Turner commented on ACCUMULO-2205:
----------------------------------------

The example is completely wrong.  I forgot about columns and was only thinking of random rows.
 I was thinking that row 5 would be deleted, but thats not the case because B would most likely
write different columns.  The chance collision in the row and column space is 1/2^93, I think
there is a greater chance of two ingest client choosing the same seed for the PRNG.

By default, continuous ingest writes the following
 
 * row = <63 bit random>
 * cf = <15 bit random>
 * cq = <15 bit random>
 * val = <ingest client uuid>:<pointer to another row> (there are some other fields
in the value)

I was thinking row 5 would be deleted.   Also if this were the case it would have left a dangling
pointer from 3 to 5, but its not. 



> Add compaction filter to continuous ingest
> ------------------------------------------
>
>                 Key: ACCUMULO-2205
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-2205
>             Project: Accumulo
>          Issue Type: Sub-task
>            Reporter: Keith Turner
>
> It would be useful run a compaction that deletes all of the nodes written by a given
ingest client (each ingest client writes a uuid that this filter could use).  This would probably
be best done after verification( or on a clone in parallel to verification).  For example
could do the following steps in testing.
> # run ingest for a time period
> # stop ingest
> # verify
> # run compaction filter to delete data written by one or more ingest clients
> # verify
> Its possible that ingest clients can over write each others nodes, but it seems like
this would not cause a problem.  Below is one example where this does not cause a problem
>  # ingest client A writes 2:A->3:A->5:A->6:A->7:A
>  # ingest client B writes 12:B->13:B->5:B->16:B->17:B
>  # every thing written by B is deleted
> In the above case,  {{2:A->3:A}} and  {{6:A->7:A}} would be the only thing left.
 There are not pointers to undefined nodes.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Mime
View raw message