accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dylan Hutchison <dhutc...@uw.edu>
Subject Re: another question on summing combiner
Date Fri, 23 Oct 2015 20:42:25 GMT
Hi Z,

The batch method you and Josh worked out at first will work.  I was
illustrating another method which uses conditional writes/deletes as an
alternative.  It's hard to say which performs better without knowing your
workload specifics.

Applying the conditional write method to your scenario, the client that
"retries" the delete-then-add operation would not write the "-1" to the
stats tabe with the delete operation because the keys are already deleted
in the main table.  This is due to the conditional mutation being rejected,
and the client never erroneously making the "retried" -1 write.  The client
would resume with the insert phase, writing back to the main table and a
"+1" to the stats table as intended.

Cheers, Dylan

On Fri, Oct 23, 2015 at 1:13 PM, z11373 <z11373@outlook.com> wrote:

> Hi Dylan,
> Right now we don't perform check (read) before performing an update. Below
> is a simple scenario.
>
> Main table is initially empty, then client sends request which translates
> to
> inserting the data, i.e.
> Main table:
> A
> B
> C
> D
>
> Stats table:
> A 1
> B 1
> C 1
> D 1
>
> Let say its next request is to delete C.
> Main table:
> A
> B
> D
>
> Stats table:
> A 1
> B 1
> C 0 (1 + -1)
> D 1
>
> Next request is to update B and D (the request got translated to delete B
> and D, and insert B and D), but let say it somehow failed in between the
> delete and insert operations, so the tables would look like:
> Main table:
> A
>
> Stats table:
> A 1
> B 0
> C 0
> D 0
>
> Client is fault-tolerant, and retry the entire request, so now the tables
> would look like:
> Main table:
> A
> B
> D
>
> Stats table:
> A 1
> B 0 (-1 + 1)
> C 0
> D 0 (-1 + 1)
>
>
> As you see above, the end state for Main table is correct, because the
> retry
> will do the 'update', but unfortunately not for the Stats table.
> The idea I mentioned last time was to have a batch job that scans the whole
> Main table to get the 'truth' data, and update Stats table accordingly, but
> in order to update 'accordingly', it first has to read the current value in
> Stats table (due to combiner), which affects performance.
>
>
> Thanks,
> Z
>
>
>
>
>
> --
> View this message in context:
> http://apache-accumulo.1065345.n5.nabble.com/another-question-on-summing-combiner-tp15238p15412.html
> Sent from the Developers mailing list archive at Nabble.com.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message