accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Marc P." <>
Subject Re: Is it possible to use an iterator to aggregate results of a BatchScanner?
Date Mon, 11 Jun 2012 20:52:34 GMT
It may also serve you to extend the appropriate aggregator, thereby
setting your source iter to the batch scanner's iterator. You can then
iteratate over the aggregated result set ( if possible ).

I haven't actually tried this, but you would be limited by memory at
the client ( depending on the size of your result set ). Mr. Slacum's
response wouldn't be riddled with that particular, error, however, but
you could stack the iterators in the same way the tablet servers do.

Sent from my phone, may contain spelling wrrors

On Mon, Jun 11, 2012 at 4:46 PM, William Slacum <> wrote:
> So, is a global sorting order required of your iterator? That's really
> the key behavioral difference in terms of output when you're dealing
> with a Scanner versus a BatchScanner.
> Please correct me if I'm wrong about assuming you're trying to get a
> distribution for the column families that appear in a given set of
> ranges.
> You can count the column qualifiers on a per tablet/row basis server
> side using an Accumulo iterator, and as you iterate over your scanner,
> you can merge those counts using a map.
> {{{
> BatchScanner scan = connector.createBatchScanner(...);
> // set up a column family counting/skipping iterator
> HashMap<Text, AtomicLong> cqCounts = new HashMap<Text, AtomicLong>();
> for(Entry<Key, Value> e : scan) {
>  AtomicLong cqCount = cqCounts.get(e.getKey().getColumnFamily());
>  if(cqCount == null) {
>     cqCount = new AtomicLong();
>     cqCounts.put(e.getKey().getColumnFamily(), cqCount);
>  }
>  cqCount.addAndGet(Long.parseLong(new String(e.getValue().get()));
> }
> }}}
> (please excuse any old/deprecated API's used)
> On Mon, Jun 11, 2012 at 2:21 PM, Hunter Provyn <> wrote:
>> I have a SkippingIterator that skips entries with cq that it has seen
>> before.
>> It works on a Scanner, but on a BatchScanner, the iterators from different
>> threads don't communicate, so the result is that results within a single
>> range are unique, but across the whole set of ranges, are not unique.
>> I'd prefer to perform the aggregation within the iterators if possible, but
>> I don't know how.
>> Also, thanks for your previous help, William, Keith, Bob and David.

View raw message