hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rob Styles <...@dynamicorange.com>
Subject Pairwise Comparison of Large Datasets
Date Mon, 31 Dec 2012 18:39:14 GMT
Happy New Year :)

Thought some of you might find this useful.

We've developed an approach to doing pairwise comparisons on large datasets
that doesn't require visibility of the whole dataset at any time. The
approach brings together pairs for comparison using incrementing
coordinates to target a value at a cell.


There is still work to do on making the approach more efficient and trying
to eliminate a pre-processing step. Help gratefully received.

If there's a toolset already out there for doing this I'd be happy to hear
about that too!



View raw message