accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From William Slacum <>
Subject Re: Is it possible to use an iterator to aggregate results of a BatchScanner?
Date Mon, 11 Jun 2012 20:46:49 GMT
So, is a global sorting order required of your iterator? That's really
the key behavioral difference in terms of output when you're dealing
with a Scanner versus a BatchScanner.

Please correct me if I'm wrong about assuming you're trying to get a
distribution for the column families that appear in a given set of

You can count the column qualifiers on a per tablet/row basis server
side using an Accumulo iterator, and as you iterate over your scanner,
you can merge those counts using a map.

BatchScanner scan = connector.createBatchScanner(...);
// set up a column family counting/skipping iterator

HashMap<Text, AtomicLong> cqCounts = new HashMap<Text, AtomicLong>();

for(Entry<Key, Value> e : scan) {
  AtomicLong cqCount = cqCounts.get(e.getKey().getColumnFamily());
  if(cqCount == null) {
     cqCount = new AtomicLong();
     cqCounts.put(e.getKey().getColumnFamily(), cqCount);
  cqCount.addAndGet(Long.parseLong(new String(e.getValue().get()));

(please excuse any old/deprecated API's used)

On Mon, Jun 11, 2012 at 2:21 PM, Hunter Provyn <> wrote:
> I have a SkippingIterator that skips entries with cq that it has seen
> before.
> It works on a Scanner, but on a BatchScanner, the iterators from different
> threads don't communicate, so the result is that results within a single
> range are unique, but across the whole set of ranges, are not unique.
> I'd prefer to perform the aggregation within the iterators if possible, but
> I don't know how.
> Also, thanks for your previous help, William, Keith, Bob and David.

View raw message