accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Medinets <david.medin...@gmail.com>
Subject Re: total table rows
Date Mon, 09 Nov 2015 16:16:53 GMT
Shutting up now. :)

On Mon, Nov 9, 2015 at 11:06 AM, Josh Elser <josh.elser@gmail.com> wrote:

> The question was to compute the number of rows, not the number of entries.
> The metadata table does not track the number of rows.
>
> David Medinets wrote:
>
>> It's not recommended to read the Metadata table? When I needed the 'real'
>> number, I ran a compaction. When I needed an estimate I just read the
>> table. I also upgraded our ingest process to track numbers as a second
>> phase to avoid the need for compaction to get 'real' numbers.
>>
>> On Mon, Nov 9, 2015 at 10:52 AM, Josh Elser<josh.elser@gmail.com>  wrote:
>>
>> Note that CountingIterator is in the system iterator package
>>> (FirstEntryInRowIterator also isn't in the user package for iterators, so
>>> its stability is a little questionable too). I think David ran into this
>>> a
>>> long time ago as well.
>>>
>>> Stable versions of both of these would be good, IMO. It isn't like Z is
>>> the first one to ask how to count the unique rows :)
>>>
>>>
>>> William Slacum wrote:
>>>
>>> Pranked... you can't use a CountingIterator, because it can't be init'd.
>>>> Can we get rid of that limitation?
>>>>
>>>> On Mon, Nov 9, 2015 at 10:43 AM, William Slacum<wslacum@gmail.com>
>>>> wrote:
>>>>
>>>> An interator stack of FirstEntryInRowIterator + CountingIterator will
>>>>
>>>>> return the count of rows in each tablet, which can then be combined on
>>>>> the
>>>>> client side.
>>>>>
>>>>> On Mon, Nov 9, 2015 at 10:25 AM, Josh Elser<josh.elser@gmail.com>
>>>>> wrote:
>>>>>
>>>>> Yeah, there's no explicit tracking of all rows in Accumulo, you're
>>>>> stuck
>>>>>
>>>>>> with enumerating them (or explicitly tracking them yourself at ingest
>>>>>> time).
>>>>>>
>>>>>> The easiest approach you can take is probably using the
>>>>>> FirstEntryInRowIterator and counting each row on the client-side.
>>>>>>
>>>>>> You could do another summation in a second iterator but this is a
>>>>>> little
>>>>>> tricky to get correct. I tried to touch on this a little in a blog
>>>>>> post[1].
>>>>>> If this is a one-off question you want to answer, doing the summation
>>>>>> on
>>>>>> the client side is likely not to take excessively longer than a
>>>>>> server-side
>>>>>> summation.
>>>>>>
>>>>>> [1]
>>>>>>
>>>>>>
>>>>>> https://blogs.apache.org/accumulo/entry/thinking_about_reads_over_accumulo
>>>>>>
>>>>>>
>>>>>> z11373 wrote:
>>>>>>
>>>>>> I want to get total rows of a table (likely has more than 100M rows),
>>>>>> I
>>>>>>
>>>>>>> think
>>>>>>> to get that information, Accumulo would have to iterate all rows
:-(
>>>>>>> This
>>>>>>> may not be typical Accumulo scenario.
>>>>>>>
>>>>>>> Is there a more efficient way to get total number of rows in
a table?
>>>>>>> When Accumulo iterating those items, does it mean it will pull
the
>>>>>>> data
>>>>>>> to
>>>>>>> the client? If yes, is there a way to ask it to return just the
>>>>>>> number,
>>>>>>> since that's the only data I care.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Z
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> View this message in context:
>>>>>>>
>>>>>>>
>>>>>>> http://apache-accumulo.1065345.n5.nabble.com/total-table-rows-tp15484.html
>>>>>>> Sent from the Developers mailing list archive at Nabble.com.
>>>>>>>
>>>>>>>
>>>>>>>
>>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message