accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From z11373 <z11...@outlook.com>
Subject Re: another question on summing combiner
Date Wed, 07 Oct 2015 19:55:22 GMT
Revisit this topic, if I go with option #2, i.e. having a batch job to fix
the stats table, now I am not really sure if it will work, since the stats
table already have summing combiner enabled, hence the batch job can't just
update the value since it'll be incorrect.
For example:

Current stats table contains:
foo     | 2
bar     | 3
test    | 1

The batch job scan the main table, and going to update the stats table, let
say the actual stats is foo=1, bar=4, test=1, hence the final stats table
would become:
foo     | 3
bar     | 7
test    | 2

It'd be correct if it removes the summing combiner from the table, but then
another process (not the batch job) may update particular key, overwriting
the correct value (updated from batch job). We can't tolerate the system is
offline, otherwise we can refresh the stats during that downtime. Any idea
on how to solve this problem?

Unfortunately there is an inherent problem with summing combiner, i.e. when
adding same key to main table, it'll behave just like 'update' when the same
key already exist, but my current logic will add <key>|1 to the stats table,
so if we have many 'update', then some values in stats table will be far
off. Similar case for deleting, it will be no-op for main table if the key
doesn't exist, but the app logic will add <key>|-1 to the stats table. This
is the reason why we're thinking to have a batch job to 'fix' the stats
table, but that also has its own problem :-(


Thanks,
Z






--
View this message in context: http://apache-accumulo.1065345.n5.nabble.com/another-question-on-summing-combiner-tp15238p15351.html
Sent from the Developers mailing list archive at Nabble.com.

Mime
View raw message