accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Josh Elser (JIRA)" <>
Subject [jira] [Commented] (ACCUMULO-2232) Combiners can cause deleted data to come back
Date Wed, 22 Jan 2014 20:45:23 GMT


Josh Elser commented on ACCUMULO-2232:

I'm a little worried about implications (sorry for using that phrase) that only running combiners
on full MajC would have on performance since, for heavy combination, you're going to be persisting
and later re-reading many records instead of just once for a potentially very long time (if
you assume that full MajCs are few and far between).

I can't come up with another easy way to fix it though for the SummingCombiner example, so
accuracy is still better than being slow. Anything else I can think of would involve persisting
deletes across non-full compactions which would require quite a bit more work to get correct,
I imagine.

> Combiners can cause deleted data to come back
> ---------------------------------------------
>                 Key: ACCUMULO-2232
>                 URL:
>             Project: Accumulo
>          Issue Type: Bug
>          Components: client, tserver
>            Reporter: John Vines
> The case-
> 3 files with-
> * 1 with a key, k, with timestamp 0, value 3
> * 1 with a delete of k with timestamp 1
> * 1 with k with timestamp 2, value 2
> The column of k has a summing combiner set on it. The issue here is that depending on
how the major compactions play out, differing values with result. If all 3 files compact,
the correct value of 2 will result. However, if 1 & 3 compact first, they will aggregate
to 5. And then the delete will fall after the combined value, resulting in the result 5 to
> First and foremost, this should be documented. I think to remedy this, combiners should
only be used on full MajC, not not full ones. This may necessitate a special flag or a new
combiner that implemented the proper semantics.

This message was sent by Atlassian JIRA

View raw message