accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <josh.el...@gmail.com>
Subject Re: total table rows
Date Mon, 09 Nov 2015 17:18:17 GMT
No worries, just getting everyone on the same page :)

David Medinets wrote:
> Shutting up now. :)
>
> On Mon, Nov 9, 2015 at 11:06 AM, Josh Elser<josh.elser@gmail.com>  wrote:
>
>> The question was to compute the number of rows, not the number of entries.
>> The metadata table does not track the number of rows.
>>
>> David Medinets wrote:
>>
>>> It's not recommended to read the Metadata table? When I needed the 'real'
>>> number, I ran a compaction. When I needed an estimate I just read the
>>> table. I also upgraded our ingest process to track numbers as a second
>>> phase to avoid the need for compaction to get 'real' numbers.
>>>
>>> On Mon, Nov 9, 2015 at 10:52 AM, Josh Elser<josh.elser@gmail.com>   wrote:
>>>
>>> Note that CountingIterator is in the system iterator package
>>>> (FirstEntryInRowIterator also isn't in the user package for iterators, so
>>>> its stability is a little questionable too). I think David ran into this
>>>> a
>>>> long time ago as well.
>>>>
>>>> Stable versions of both of these would be good, IMO. It isn't like Z is
>>>> the first one to ask how to count the unique rows :)
>>>>
>>>>
>>>> William Slacum wrote:
>>>>
>>>> Pranked... you can't use a CountingIterator, because it can't be init'd.
>>>>> Can we get rid of that limitation?
>>>>>
>>>>> On Mon, Nov 9, 2015 at 10:43 AM, William Slacum<wslacum@gmail.com>
>>>>> wrote:
>>>>>
>>>>> An interator stack of FirstEntryInRowIterator + CountingIterator will
>>>>>
>>>>>> return the count of rows in each tablet, which can then be combined
on
>>>>>> the
>>>>>> client side.
>>>>>>
>>>>>> On Mon, Nov 9, 2015 at 10:25 AM, Josh Elser<josh.elser@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>> Yeah, there's no explicit tracking of all rows in Accumulo, you're
>>>>>> stuck
>>>>>>
>>>>>>> with enumerating them (or explicitly tracking them yourself at
ingest
>>>>>>> time).
>>>>>>>
>>>>>>> The easiest approach you can take is probably using the
>>>>>>> FirstEntryInRowIterator and counting each row on the client-side.
>>>>>>>
>>>>>>> You could do another summation in a second iterator but this
is a
>>>>>>> little
>>>>>>> tricky to get correct. I tried to touch on this a little in a
blog
>>>>>>> post[1].
>>>>>>> If this is a one-off question you want to answer, doing the summation
>>>>>>> on
>>>>>>> the client side is likely not to take excessively longer than
a
>>>>>>> server-side
>>>>>>> summation.
>>>>>>>
>>>>>>> [1]
>>>>>>>
>>>>>>>
>>>>>>> https://blogs.apache.org/accumulo/entry/thinking_about_reads_over_accumulo
>>>>>>>
>>>>>>>
>>>>>>> z11373 wrote:
>>>>>>>
>>>>>>> I want to get total rows of a table (likely has more than 100M
rows),
>>>>>>> I
>>>>>>>
>>>>>>>> think
>>>>>>>> to get that information, Accumulo would have to iterate all
rows :-(
>>>>>>>> This
>>>>>>>> may not be typical Accumulo scenario.
>>>>>>>>
>>>>>>>> Is there a more efficient way to get total number of rows
in a table?
>>>>>>>> When Accumulo iterating those items, does it mean it will
pull the
>>>>>>>> data
>>>>>>>> to
>>>>>>>> the client? If yes, is there a way to ask it to return just
the
>>>>>>>> number,
>>>>>>>> since that's the only data I care.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Z
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> View this message in context:
>>>>>>>>
>>>>>>>>
>>>>>>>> http://apache-accumulo.1065345.n5.nabble.com/total-table-rows-tp15484.html
>>>>>>>> Sent from the Developers mailing list archive at Nabble.com.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>

Mime
View raw message