accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric Newton <eric.new...@gmail.com>
Subject Re: How to get count of table rows using accumulo shell
Date Fri, 11 Oct 2013 19:15:34 GMT
Actually, the egrep was used on purpose: it's the only way to get the
shell to use the BatchScanner, which can talk to multiple tservers at
once.

-Eric


On Fri, Oct 11, 2013 at 3:10 PM, Josh Elser <josh.elser@gmail.com> wrote:
> You'll need to add the '-np' option on the scan command as well.
>
>
> On 10/11/2013 03:05 PM, Jared Winick wrote:
>>
>> After following the commands Eric lists to set the iterator for that
>> table, instead of running 'egrep' in the shell, you could do this from the
>> Linux command line
>>
>> accumulo shell -u username -p password -e "scan -t foo" | wc -l
>>
>>
>> On Fri, Oct 11, 2013 at 11:42 AM, Eric Newton <eric.newton@gmail.com
>> <mailto:eric.newton@gmail.com>> wrote:
>>
>>     You can stack a counting Combiner over the FirstEntryInRowIterator and
>>     batch scan the table. If it's just a test data set with under a
>>     billion rows, you can just count the result set coming out of the
>>     FirstEntryInRowIterator.  You'll be I/O bound at the client, but it
>>     will work.
>>
>>     This does it with the shell, but the output is kinda voluminous:
>>
>>     root@test> createtable foo
>>     root@test foo> insert row1 cf col1 value
>>     root@test foo> insert row1 cf col2 value
>>     root@test foo> insert row1 cf col999 value
>>     root@test foo> insert row2 cf col1 value
>>     root@test foo> scan
>>     row1 cf:col1 []    value
>>     row1 cf:col2 []    value
>>     row1 cf:col999 []    value
>>     row2 cf:col1 []    value
>>     root@test foo> setiter -class
>>     org.apache.accumulo.core.iterators.FirstEntryInRowIterator -p 99 -scan
>>     Only allows iteration over the first entry per row
>>     ----------> set FirstEntryInRowIterator parameter scansBeforeSeek,
>>     Number of scans to try before seeking [10]: 10
>>     root@test foo> egrep .*
>>     row1 cf:col1 []    value
>>     row2 cf:col1 []    value
>>
>>
>>     On Fri, Oct 11, 2013 at 10:53 AM, Terry P. <texpilot@gmail.com
>>     <mailto:texpilot@gmail.com>> wrote:
>>     > Hi guys,
>>     > I'm still a bit of a newbie as I'm more of an admin than a
>>     developer, and
>>     > now that formal testing has begun, I have testers asking me how
>>     to get a
>>     > total count of records in Accumulo for verification purposes
>>     after test
>>     > ingests have been run.
>>     >
>>     > In our case when I say "records" I mean the number of distinct
>>     rowkeys, not
>>     > the total number of entries.
>>     >
>>     > Is there any way to do this using just the Accumulo shell, maybe
>>     by writing
>>     > an aggregator or other class that can be run from within the
>>     Accumulo shell?
>>     >
>>     > Many thanks in advance,
>>     > Terry
>>     >
>>     >
>>     > On Tue, Jan 22, 2013 at 6:03 PM, Terry P. <texpilot@gmail.com
>>     <mailto:texpilot@gmail.com>> wrote:
>>     >>
>>     >> Greetings everyone,
>>     >> I want to simply get the total count of rows in a table using
>>     the accumulo
>>     >> shell.  I'm very new to Accumulo so I apologize if it's a
>>     newbie question.
>>     >>
>>     >> I'm prototyping with the accumulo shell, and love how it can ingest
>>     >> records using exefile, so I've used python to generate a lot of
>>     test data.
>>     >> For some test cases in this sprint I need to verify the rows
>>     loaded match
>>     >> what's expected, hence the reason I need to get the total rows
>>     in a table.
>>     >>
>>     >> I'd bet there is some way to use setiter or setscaniter with
>>     the -agg
>>     >> option, but I can't figure it out.
>>     >>
>>     >> Any help would be greatly appreciated.
>>     >>
>>     >> Best regards,
>>     >> Terry
>>     >
>>     >
>>
>>
>

Mime
View raw message