accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric Newton <eric.new...@gmail.com>
Subject Re: How to get count of table rows using accumulo shell
Date Fri, 11 Oct 2013 19:24:05 GMT
Ya, you'll want to remove the iterator after you do the count.  You
might be able to use it as a scan-only iterator, but I was just being
lazy.

-Eric


On Fri, Oct 11, 2013 at 3:18 PM, Terry P. <texpilot@gmail.com> wrote:
> Thanks Eric, Jared, and Josh.
>
> Jared's reply I realize that the setiter command stays in effect beyond my
> shell session obviously.  I see it now with the listiter command in the
> shell.
>
> Our app normally does lookups by rowkey.  Will the firstEntry iterator
> adversely affect those queries?  I assume not, but I want to double check.
>
> Thanks again guys, this is very helpful,
> Terry
>
>
>
> On Fri, Oct 11, 2013 at 2:15 PM, Eric Newton <eric.newton@gmail.com> wrote:
>>
>> Actually, the egrep was used on purpose: it's the only way to get the
>> shell to use the BatchScanner, which can talk to multiple tservers at
>> once.
>>
>> -Eric
>>
>>
>> On Fri, Oct 11, 2013 at 3:10 PM, Josh Elser <josh.elser@gmail.com> wrote:
>> > You'll need to add the '-np' option on the scan command as well.
>> >
>> >
>> > On 10/11/2013 03:05 PM, Jared Winick wrote:
>> >>
>> >> After following the commands Eric lists to set the iterator for that
>> >> table, instead of running 'egrep' in the shell, you could do this from
>> >> the
>> >> Linux command line
>> >>
>> >> accumulo shell -u username -p password -e "scan -t foo" | wc -l
>> >>
>> >>
>> >> On Fri, Oct 11, 2013 at 11:42 AM, Eric Newton <eric.newton@gmail.com
>> >> <mailto:eric.newton@gmail.com>> wrote:
>> >>
>> >>     You can stack a counting Combiner over the FirstEntryInRowIterator
>> >> and
>> >>     batch scan the table. If it's just a test data set with under a
>> >>     billion rows, you can just count the result set coming out of the
>> >>     FirstEntryInRowIterator.  You'll be I/O bound at the client, but it
>> >>     will work.
>> >>
>> >>     This does it with the shell, but the output is kinda voluminous:
>> >>
>> >>     root@test> createtable foo
>> >>     root@test foo> insert row1 cf col1 value
>> >>     root@test foo> insert row1 cf col2 value
>> >>     root@test foo> insert row1 cf col999 value
>> >>     root@test foo> insert row2 cf col1 value
>> >>     root@test foo> scan
>> >>     row1 cf:col1 []    value
>> >>     row1 cf:col2 []    value
>> >>     row1 cf:col999 []    value
>> >>     row2 cf:col1 []    value
>> >>     root@test foo> setiter -class
>> >>     org.apache.accumulo.core.iterators.FirstEntryInRowIterator -p 99
>> >> -scan
>> >>     Only allows iteration over the first entry per row
>> >>     ----------> set FirstEntryInRowIterator parameter scansBeforeSeek,
>> >>     Number of scans to try before seeking [10]: 10
>> >>     root@test foo> egrep .*
>> >>     row1 cf:col1 []    value
>> >>     row2 cf:col1 []    value
>> >>
>> >>
>> >>     On Fri, Oct 11, 2013 at 10:53 AM, Terry P. <texpilot@gmail.com
>> >>     <mailto:texpilot@gmail.com>> wrote:
>> >>     > Hi guys,
>> >>     > I'm still a bit of a newbie as I'm more of an admin than a
>> >>     developer, and
>> >>     > now that formal testing has begun, I have testers asking me how
>> >>     to get a
>> >>     > total count of records in Accumulo for verification purposes
>> >>     after test
>> >>     > ingests have been run.
>> >>     >
>> >>     > In our case when I say "records" I mean the number of distinct
>> >>     rowkeys, not
>> >>     > the total number of entries.
>> >>     >
>> >>     > Is there any way to do this using just the Accumulo shell, maybe
>> >>     by writing
>> >>     > an aggregator or other class that can be run from within the
>> >>     Accumulo shell?
>> >>     >
>> >>     > Many thanks in advance,
>> >>     > Terry
>> >>     >
>> >>     >
>> >>     > On Tue, Jan 22, 2013 at 6:03 PM, Terry P. <texpilot@gmail.com
>> >>     <mailto:texpilot@gmail.com>> wrote:
>> >>     >>
>> >>     >> Greetings everyone,
>> >>     >> I want to simply get the total count of rows in a table using
>> >>     the accumulo
>> >>     >> shell.  I'm very new to Accumulo so I apologize if it's a
>> >>     newbie question.
>> >>     >>
>> >>     >> I'm prototyping with the accumulo shell, and love how it can
>> >> ingest
>> >>     >> records using exefile, so I've used python to generate a lot
of
>> >>     test data.
>> >>     >> For some test cases in this sprint I need to verify the rows
>> >>     loaded match
>> >>     >> what's expected, hence the reason I need to get the total rows
>> >>     in a table.
>> >>     >>
>> >>     >> I'd bet there is some way to use setiter or setscaniter with
>> >>     the -agg
>> >>     >> option, but I can't figure it out.
>> >>     >>
>> >>     >> Any help would be greatly appreciated.
>> >>     >>
>> >>     >> Best regards,
>> >>     >> Terry
>> >>     >
>> >>     >
>> >>
>> >>
>> >
>
>

Mime
View raw message