accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Marc P." <marc.par...@gmail.com>
Subject Re: Is it possible to use an iterator to aggregate results of a BatchScanner?
Date Mon, 11 Jun 2012 20:54:20 GMT
I should point out, I mean that depending on what your iterators do (
and more importantly, what they store ), you may be limited by memory.
It's dependent upon multiple factors, obviously.

 ---
 Sent from my phone, may contain spelling wrrors
On Mon, Jun 11, 2012 at 4:52 PM, Marc P. <marc.parisi@gmail.com> wrote:
> It may also serve you to extend the appropriate aggregator, thereby
> setting your source iter to the batch scanner's iterator. You can then
> iteratate over the aggregated result set ( if possible ).
>
> I haven't actually tried this, but you would be limited by memory at
> the client ( depending on the size of your result set ). Mr. Slacum's
> response wouldn't be riddled with that particular, error, however, but
> you could stack the iterators in the same way the tablet servers do.
>
> ---
> Sent from my phone, may contain spelling wrrors
>
> On Mon, Jun 11, 2012 at 4:46 PM, William Slacum <wslacum@gmail.com> wrote:
>> So, is a global sorting order required of your iterator? That's really
>> the key behavioral difference in terms of output when you're dealing
>> with a Scanner versus a BatchScanner.
>>
>> Please correct me if I'm wrong about assuming you're trying to get a
>> distribution for the column families that appear in a given set of
>> ranges.
>>
>> You can count the column qualifiers on a per tablet/row basis server
>> side using an Accumulo iterator, and as you iterate over your scanner,
>> you can merge those counts using a map.
>>
>> {{{
>> BatchScanner scan = connector.createBatchScanner(...);
>> // set up a column family counting/skipping iterator
>>
>> HashMap<Text, AtomicLong> cqCounts = new HashMap<Text, AtomicLong>();
>>
>> for(Entry<Key, Value> e : scan) {
>>  AtomicLong cqCount = cqCounts.get(e.getKey().getColumnFamily());
>>  if(cqCount == null) {
>>     cqCount = new AtomicLong();
>>     cqCounts.put(e.getKey().getColumnFamily(), cqCount);
>>  }
>>  cqCount.addAndGet(Long.parseLong(new String(e.getValue().get()));
>> }
>> }}}
>>
>> (please excuse any old/deprecated API's used)
>>
>> On Mon, Jun 11, 2012 at 2:21 PM, Hunter Provyn <hunter@ccri.com> wrote:
>>> I have a SkippingIterator that skips entries with cq that it has seen
>>> before.
>>> It works on a Scanner, but on a BatchScanner, the iterators from different
>>> threads don't communicate, so the result is that results within a single
>>> range are unique, but across the whole set of ranges, are not unique.
>>> I'd prefer to perform the aggregation within the iterators if possible, but
>>> I don't know how.
>>>
>>> Also, thanks for your previous help, William, Keith, Bob and David.

Mime
View raw message