accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <josh.el...@gmail.com>
Subject Re: total table rows
Date Mon, 09 Nov 2015 15:52:24 GMT
Note that CountingIterator is in the system iterator package 
(FirstEntryInRowIterator also isn't in the user package for iterators, 
so its stability is a little questionable too). I think David ran into 
this a long time ago as well.

Stable versions of both of these would be good, IMO. It isn't like Z is 
the first one to ask how to count the unique rows :)

William Slacum wrote:
> Pranked... you can't use a CountingIterator, because it can't be init'd.
> Can we get rid of that limitation?
>
> On Mon, Nov 9, 2015 at 10:43 AM, William Slacum<wslacum@gmail.com>  wrote:
>
>> An interator stack of FirstEntryInRowIterator + CountingIterator will
>> return the count of rows in each tablet, which can then be combined on the
>> client side.
>>
>> On Mon, Nov 9, 2015 at 10:25 AM, Josh Elser<josh.elser@gmail.com>  wrote:
>>
>>> Yeah, there's no explicit tracking of all rows in Accumulo, you're stuck
>>> with enumerating them (or explicitly tracking them yourself at ingest time).
>>>
>>> The easiest approach you can take is probably using the
>>> FirstEntryInRowIterator and counting each row on the client-side.
>>>
>>> You could do another summation in a second iterator but this is a little
>>> tricky to get correct. I tried to touch on this a little in a blog post[1].
>>> If this is a one-off question you want to answer, doing the summation on
>>> the client side is likely not to take excessively longer than a server-side
>>> summation.
>>>
>>> [1]
>>> https://blogs.apache.org/accumulo/entry/thinking_about_reads_over_accumulo
>>>
>>>
>>> z11373 wrote:
>>>
>>>> I want to get total rows of a table (likely has more than 100M rows), I
>>>> think
>>>> to get that information, Accumulo would have to iterate all rows :-( This
>>>> may not be typical Accumulo scenario.
>>>>
>>>> Is there a more efficient way to get total number of rows in a table?
>>>> When Accumulo iterating those items, does it mean it will pull the data
>>>> to
>>>> the client? If yes, is there a way to ask it to return just the number,
>>>> since that's the only data I care.
>>>>
>>>> Thanks,
>>>> Z
>>>>
>>>>
>>>>
>>>> --
>>>> View this message in context:
>>>> http://apache-accumulo.1065345.n5.nabble.com/total-table-rows-tp15484.html
>>>> Sent from the Developers mailing list archive at Nabble.com.
>>>>
>

Mime
View raw message