accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Terry P." <texpi...@gmail.com>
Subject Re: How to get count of table rows using accumulo shell
Date Fri, 11 Oct 2013 19:18:29 GMT
Thanks Eric, Jared, and Josh.

Jared's reply I realize that the setiter command stays in effect beyond my
shell session obviously.  I see it now with the listiter command in the
shell.

Our app normally does lookups by rowkey.  Will the firstEntry iterator
adversely affect those queries?  I assume not, but I want to double check.

Thanks again guys, this is very helpful,
Terry



On Fri, Oct 11, 2013 at 2:15 PM, Eric Newton <eric.newton@gmail.com> wrote:

> Actually, the egrep was used on purpose: it's the only way to get the
> shell to use the BatchScanner, which can talk to multiple tservers at
> once.
>
> -Eric
>
>
> On Fri, Oct 11, 2013 at 3:10 PM, Josh Elser <josh.elser@gmail.com> wrote:
> > You'll need to add the '-np' option on the scan command as well.
> >
> >
> > On 10/11/2013 03:05 PM, Jared Winick wrote:
> >>
> >> After following the commands Eric lists to set the iterator for that
> >> table, instead of running 'egrep' in the shell, you could do this from
> the
> >> Linux command line
> >>
> >> accumulo shell -u username -p password -e "scan -t foo" | wc -l
> >>
> >>
> >> On Fri, Oct 11, 2013 at 11:42 AM, Eric Newton <eric.newton@gmail.com
> >> <mailto:eric.newton@gmail.com>> wrote:
> >>
> >>     You can stack a counting Combiner over the FirstEntryInRowIterator
> and
> >>     batch scan the table. If it's just a test data set with under a
> >>     billion rows, you can just count the result set coming out of the
> >>     FirstEntryInRowIterator.  You'll be I/O bound at the client, but it
> >>     will work.
> >>
> >>     This does it with the shell, but the output is kinda voluminous:
> >>
> >>     root@test> createtable foo
> >>     root@test foo> insert row1 cf col1 value
> >>     root@test foo> insert row1 cf col2 value
> >>     root@test foo> insert row1 cf col999 value
> >>     root@test foo> insert row2 cf col1 value
> >>     root@test foo> scan
> >>     row1 cf:col1 []    value
> >>     row1 cf:col2 []    value
> >>     row1 cf:col999 []    value
> >>     row2 cf:col1 []    value
> >>     root@test foo> setiter -class
> >>     org.apache.accumulo.core.iterators.FirstEntryInRowIterator -p 99
> -scan
> >>     Only allows iteration over the first entry per row
> >>     ----------> set FirstEntryInRowIterator parameter scansBeforeSeek,
> >>     Number of scans to try before seeking [10]: 10
> >>     root@test foo> egrep .*
> >>     row1 cf:col1 []    value
> >>     row2 cf:col1 []    value
> >>
> >>
> >>     On Fri, Oct 11, 2013 at 10:53 AM, Terry P. <texpilot@gmail.com
> >>     <mailto:texpilot@gmail.com>> wrote:
> >>     > Hi guys,
> >>     > I'm still a bit of a newbie as I'm more of an admin than a
> >>     developer, and
> >>     > now that formal testing has begun, I have testers asking me how
> >>     to get a
> >>     > total count of records in Accumulo for verification purposes
> >>     after test
> >>     > ingests have been run.
> >>     >
> >>     > In our case when I say "records" I mean the number of distinct
> >>     rowkeys, not
> >>     > the total number of entries.
> >>     >
> >>     > Is there any way to do this using just the Accumulo shell, maybe
> >>     by writing
> >>     > an aggregator or other class that can be run from within the
> >>     Accumulo shell?
> >>     >
> >>     > Many thanks in advance,
> >>     > Terry
> >>     >
> >>     >
> >>     > On Tue, Jan 22, 2013 at 6:03 PM, Terry P. <texpilot@gmail.com
> >>     <mailto:texpilot@gmail.com>> wrote:
> >>     >>
> >>     >> Greetings everyone,
> >>     >> I want to simply get the total count of rows in a table using
> >>     the accumulo
> >>     >> shell.  I'm very new to Accumulo so I apologize if it's a
> >>     newbie question.
> >>     >>
> >>     >> I'm prototyping with the accumulo shell, and love how it can
> ingest
> >>     >> records using exefile, so I've used python to generate a lot of
> >>     test data.
> >>     >> For some test cases in this sprint I need to verify the rows
> >>     loaded match
> >>     >> what's expected, hence the reason I need to get the total rows
> >>     in a table.
> >>     >>
> >>     >> I'd bet there is some way to use setiter or setscaniter with
> >>     the -agg
> >>     >> option, but I can't figure it out.
> >>     >>
> >>     >> Any help would be greatly appreciated.
> >>     >>
> >>     >> Best regards,
> >>     >> Terry
> >>     >
> >>     >
> >>
> >>
> >
>

Mime
View raw message