accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Keith Turner <ke...@deenlo.com>
Subject Re: how to use CountingIterator to count records?
Date Wed, 06 Jun 2012 18:02:38 GMT
On Wed, Jun 6, 2012 at 1:46 PM, William Slacum <wslacum@gmail.com> wrote:
> You're kind of there. Essentially, you can think of your Scanner's
> interactions with the TServers as a tree with a height of two. Your

One comment to add.  The Scanner will do this work serially, one
tablet server at a time.  The batch scanner would execute the iterator
in parallel on multiple tablet servers at a time.


> Scanner is the "root" and its children are all of the TServers it
> needs to interact with. Essentially, the operation you'd want to is
> sum the number of records each of the children have.
>
> In Accumulo terms, you can use something like a CountingIterator to
> count the number of results on each TServer. You can then sum all of
> those intermediate results to get a total count of results.
>
> On Wed, Jun 6, 2012 at 10:39 AM, Hunter Provyn <hunter@ccri.com> wrote:
>> I want to know the number of records a scanner has without actually getting
>> the records from cloudbase.
>> I've been looking at CountingIterator (1.3.4), which has a getCount()
>> method.  However, I don't know how
>> to access the instance to call getCount() on it because Cloudbase server
>> just passes back the entries and doesn't expose the instance of the
>> iterator.
>>
>> It is possible to use an AggregatingIterator to aggregate all entries into a
>> single entry whose value is the number of entries.  But I was wondering if
>> there was a better way that possibly makes use of the CountingIterator
>> class.
>>

Mime
View raw message