accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dave Hardcastle <>
Subject Mini Accumulo cluster
Date Wed, 13 May 2015 20:02:08 GMT

Is it crazy to use a MiniAccumuloCluster to measure the *relative*
performance of two different implementations of iterators?

Obviously it would be better to do it on a real Accumulo cluster, but
that's not possible for several reasons.

The approach would be something like:
- Fire up a Mini cluster
- Bulk import a file
- Start timer
- Set up a BatchScanner with one of the iterator stacks and use it to query
for lots of different ranges
- Iterate through the results of this
- Stop timer

Repeat with the other implementation of the iterators.

Of course, the difference in performance may not be measurable, if the time
is dominated by the disk-seek time, but that would still be useful
information. And the absolute performance wouldn't be representative of
what you'd get on a real cluster as there's no network latency in these
trials, but that's fine as I'm mainly interested in which of the two
implementations of the iterators is most performant.

Similarly, could the same approach be used to compare the performance on
SSD vs hard disk?



View raw message