accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <josh.el...@gmail.com>
Subject Re: data comparison tool
Date Thu, 12 Nov 2015 16:28:01 GMT
Yep, that's an easy way to check. It can just be slow depending on how 
much data you have.

I tried to write a slightly more parallel approach to verifying this 
based using a Merkle tree.

https://github.com/apache/accumulo/tree/master/test/system/merkle-replication

It's a little tricky as the boundaries of each leaf-node in the tree (a 
tablet) can affect the root value of the tree. In other words, if you 
don't have the same split points on both tables, the verification would 
fail.

z11373 wrote:
> We currently write to tables in 2 places (this may change once we leverage
> Accumulo 1.7 replication feature or another solution). I wonder if Accumulo
> provides (or someone already wrote) the tool to compare data from both
> tables (from 2 different Accumulo instances)?
> Naïve solution I can think of is to iterate both tables (since they already
> sorted by row ids) and perform something like 'merge' comparison, but it'd
> definitely save my time if someone already wrote the implementation.
>
> Thanks,
> Z
>
>
>
> --
> View this message in context: http://apache-accumulo.1065345.n5.nabble.com/data-comparison-tool-tp15537.html
> Sent from the Developers mailing list archive at Nabble.com.

Mime
View raw message