accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <>
Subject Re: total table rows
Date Mon, 09 Nov 2015 16:06:41 GMT
The question was to compute the number of rows, not the number of 
entries. The metadata table does not track the number of rows.

David Medinets wrote:
> It's not recommended to read the Metadata table? When I needed the 'real'
> number, I ran a compaction. When I needed an estimate I just read the
> table. I also upgraded our ingest process to track numbers as a second
> phase to avoid the need for compaction to get 'real' numbers.
> On Mon, Nov 9, 2015 at 10:52 AM, Josh Elser<>  wrote:
>> Note that CountingIterator is in the system iterator package
>> (FirstEntryInRowIterator also isn't in the user package for iterators, so
>> its stability is a little questionable too). I think David ran into this a
>> long time ago as well.
>> Stable versions of both of these would be good, IMO. It isn't like Z is
>> the first one to ask how to count the unique rows :)
>> William Slacum wrote:
>>> Pranked... you can't use a CountingIterator, because it can't be init'd.
>>> Can we get rid of that limitation?
>>> On Mon, Nov 9, 2015 at 10:43 AM, William Slacum<>
>>> wrote:
>>> An interator stack of FirstEntryInRowIterator + CountingIterator will
>>>> return the count of rows in each tablet, which can then be combined on
>>>> the
>>>> client side.
>>>> On Mon, Nov 9, 2015 at 10:25 AM, Josh Elser<>
>>>> wrote:
>>>> Yeah, there's no explicit tracking of all rows in Accumulo, you're stuck
>>>>> with enumerating them (or explicitly tracking them yourself at ingest
>>>>> time).
>>>>> The easiest approach you can take is probably using the
>>>>> FirstEntryInRowIterator and counting each row on the client-side.
>>>>> You could do another summation in a second iterator but this is a little
>>>>> tricky to get correct. I tried to touch on this a little in a blog
>>>>> post[1].
>>>>> If this is a one-off question you want to answer, doing the summation
>>>>> the client side is likely not to take excessively longer than a
>>>>> server-side
>>>>> summation.
>>>>> [1]
>>>>> z11373 wrote:
>>>>> I want to get total rows of a table (likely has more than 100M rows),
>>>>>> think
>>>>>> to get that information, Accumulo would have to iterate all rows
>>>>>> This
>>>>>> may not be typical Accumulo scenario.
>>>>>> Is there a more efficient way to get total number of rows in a table?
>>>>>> When Accumulo iterating those items, does it mean it will pull the
>>>>>> to
>>>>>> the client? If yes, is there a way to ask it to return just the number,
>>>>>> since that's the only data I care.
>>>>>> Thanks,
>>>>>> Z
>>>>>> --
>>>>>> View this message in context:
>>>>>> Sent from the Developers mailing list archive at

View raw message