accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric Newton <>
Subject Re: How to get count of table rows using accumulo shell
Date Fri, 11 Oct 2013 17:42:20 GMT
You can stack a counting Combiner over the FirstEntryInRowIterator and
batch scan the table. If it's just a test data set with under a
billion rows, you can just count the result set coming out of the
FirstEntryInRowIterator.  You'll be I/O bound at the client, but it
will work.

This does it with the shell, but the output is kinda voluminous:

root@test> createtable foo
root@test foo> insert row1 cf col1 value
root@test foo> insert row1 cf col2 value
root@test foo> insert row1 cf col999 value
root@test foo> insert row2 cf col1 value
root@test foo> scan
row1 cf:col1 []    value
row1 cf:col2 []    value
row1 cf:col999 []    value
row2 cf:col1 []    value
root@test foo> setiter -class
org.apache.accumulo.core.iterators.FirstEntryInRowIterator -p 99 -scan
Only allows iteration over the first entry per row
----------> set FirstEntryInRowIterator parameter scansBeforeSeek,
Number of scans to try before seeking [10]: 10
root@test foo> egrep .*
row1 cf:col1 []    value
row2 cf:col1 []    value

On Fri, Oct 11, 2013 at 10:53 AM, Terry P. <> wrote:
> Hi guys,
> I'm still a bit of a newbie as I'm more of an admin than a developer, and
> now that formal testing has begun, I have testers asking me how to get a
> total count of records in Accumulo for verification purposes after test
> ingests have been run.
> In our case when I say "records" I mean the number of distinct rowkeys, not
> the total number of entries.
> Is there any way to do this using just the Accumulo shell, maybe by writing
> an aggregator or other class that can be run from within the Accumulo shell?
> Many thanks in advance,
> Terry
> On Tue, Jan 22, 2013 at 6:03 PM, Terry P. <> wrote:
>> Greetings everyone,
>> I want to simply get the total count of rows in a table using the accumulo
>> shell.  I'm very new to Accumulo so I apologize if it's a newbie question.
>> I'm prototyping with the accumulo shell, and love how it can ingest
>> records using exefile, so I've used python to generate a lot of test data.
>> For some test cases in this sprint I need to verify the rows loaded match
>> what's expected, hence the reason I need to get the total rows in a table.
>> I'd bet there is some way to use setiter or setscaniter with the -agg
>> option, but I can't figure it out.
>> Any help would be greatly appreciated.
>> Best regards,
>> Terry

View raw message