accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bob.Thor...@l-3com.com
Subject RE: how to use CountingIterator to count records?
Date Thu, 07 Jun 2012 12:55:38 GMT
Hunter

If you have access to the ingest of this data, have you considered implementing an Edge Table
to keep the count based on a document partition index (or similar aggregate key)?  I have
to keep up with the same statistic and have moved to the Edge Table approach for a direct
look up of occurrences.  

-----Original Message-----
From: Keith Turner [mailto:keith@deenlo.com] 
Sent: Wednesday, June 06, 2012 13:03
To: user@accumulo.apache.org
Subject: Re: how to use CountingIterator to count records?

On Wed, Jun 6, 2012 at 1:46 PM, William Slacum <wslacum@gmail.com> wrote:
> You're kind of there. Essentially, you can think of your Scanner's 
> interactions with the TServers as a tree with a height of two. Your

One comment to add.  The Scanner will do this work serially, one tablet server at a time.
 The batch scanner would execute the iterator in parallel on multiple tablet servers at a
time.


> Scanner is the "root" and its children are all of the TServers it 
> needs to interact with. Essentially, the operation you'd want to is 
> sum the number of records each of the children have.
>
> In Accumulo terms, you can use something like a CountingIterator to 
> count the number of results on each TServer. You can then sum all of 
> those intermediate results to get a total count of results.
>
> On Wed, Jun 6, 2012 at 10:39 AM, Hunter Provyn <hunter@ccri.com> wrote:
>> I want to know the number of records a scanner has without actually 
>> getting the records from cloudbase.
>> I've been looking at CountingIterator (1.3.4), which has a getCount() 
>> method.  However, I don't know how to access the instance to call 
>> getCount() on it because Cloudbase server just passes back the 
>> entries and doesn't expose the instance of the iterator.
>>
>> It is possible to use an AggregatingIterator to aggregate all entries 
>> into a single entry whose value is the number of entries.  But I was 
>> wondering if there was a better way that possibly makes use of the 
>> CountingIterator class.
>>

Mime
View raw message