Return-Path: X-Original-To: apmail-accumulo-user-archive@www.apache.org Delivered-To: apmail-accumulo-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 1407D109C7 for ; Fri, 11 Oct 2013 19:16:05 +0000 (UTC) Received: (qmail 65620 invoked by uid 500); 11 Oct 2013 19:16:03 -0000 Delivered-To: apmail-accumulo-user-archive@accumulo.apache.org Received: (qmail 65588 invoked by uid 500); 11 Oct 2013 19:16:02 -0000 Mailing-List: contact user-help@accumulo.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@accumulo.apache.org Delivered-To: mailing list user@accumulo.apache.org Received: (qmail 65580 invoked by uid 99); 11 Oct 2013 19:16:01 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 11 Oct 2013 19:16:01 +0000 X-ASF-Spam-Status: No, hits=0.3 required=5.0 tests=FREEMAIL_REPLY,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of eric.newton@gmail.com designates 74.125.82.180 as permitted sender) Received: from [74.125.82.180] (HELO mail-we0-f180.google.com) (74.125.82.180) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 11 Oct 2013 19:15:56 +0000 Received: by mail-we0-f180.google.com with SMTP id q59so4609122wes.39 for ; Fri, 11 Oct 2013 12:15:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=zvWYq6eTZOT5SNpDqUyHNjVGneJsBvCcqq5hs8MClXs=; b=jdGMfhDhw7iCdOUyxNuR0ur/yxMlTlas6ocbPlH1+bRb2L/S3fqQmGyRs65WzEIH4k n6TvVpnIVT0d1FGbVC4P6rbhGV0vMj8cdEdLSs+iZZlJDOWQN19uQTf/ERdV4XgXtzS/ sJRF5s2u95f2FQoruKn+kMPAPKKiVTO6lmV3mISoFo4Bo0gimBikDenD/Um0A4joUUZg 76dn6RjU0SFzsjeUoyLyCdK07P0OOrNmYolf9hPlrAGju0WPiVP+8peoYOCOOq0A+BlW /BOtJSeTtL7VsjeFzmQjlgRAnxKFaDQ86x8vK+o912MewDH+Upvy/8LxNx/2ivX3GGWt m+fw== MIME-Version: 1.0 X-Received: by 10.180.126.3 with SMTP id mu3mr4447796wib.27.1381518934354; Fri, 11 Oct 2013 12:15:34 -0700 (PDT) Received: by 10.216.85.138 with HTTP; Fri, 11 Oct 2013 12:15:34 -0700 (PDT) In-Reply-To: <52584D2F.4050605@gmail.com> References: <52584D2F.4050605@gmail.com> Date: Fri, 11 Oct 2013 15:15:34 -0400 Message-ID: Subject: Re: How to get count of table rows using accumulo shell From: Eric Newton To: "user@accumulo.apache.org" Content-Type: text/plain; charset=ISO-8859-1 X-Virus-Checked: Checked by ClamAV on apache.org Actually, the egrep was used on purpose: it's the only way to get the shell to use the BatchScanner, which can talk to multiple tservers at once. -Eric On Fri, Oct 11, 2013 at 3:10 PM, Josh Elser wrote: > You'll need to add the '-np' option on the scan command as well. > > > On 10/11/2013 03:05 PM, Jared Winick wrote: >> >> After following the commands Eric lists to set the iterator for that >> table, instead of running 'egrep' in the shell, you could do this from the >> Linux command line >> >> accumulo shell -u username -p password -e "scan -t foo" | wc -l >> >> >> On Fri, Oct 11, 2013 at 11:42 AM, Eric Newton > > wrote: >> >> You can stack a counting Combiner over the FirstEntryInRowIterator and >> batch scan the table. If it's just a test data set with under a >> billion rows, you can just count the result set coming out of the >> FirstEntryInRowIterator. You'll be I/O bound at the client, but it >> will work. >> >> This does it with the shell, but the output is kinda voluminous: >> >> root@test> createtable foo >> root@test foo> insert row1 cf col1 value >> root@test foo> insert row1 cf col2 value >> root@test foo> insert row1 cf col999 value >> root@test foo> insert row2 cf col1 value >> root@test foo> scan >> row1 cf:col1 [] value >> row1 cf:col2 [] value >> row1 cf:col999 [] value >> row2 cf:col1 [] value >> root@test foo> setiter -class >> org.apache.accumulo.core.iterators.FirstEntryInRowIterator -p 99 -scan >> Only allows iteration over the first entry per row >> ----------> set FirstEntryInRowIterator parameter scansBeforeSeek, >> Number of scans to try before seeking [10]: 10 >> root@test foo> egrep .* >> row1 cf:col1 [] value >> row2 cf:col1 [] value >> >> >> On Fri, Oct 11, 2013 at 10:53 AM, Terry P. > > wrote: >> > Hi guys, >> > I'm still a bit of a newbie as I'm more of an admin than a >> developer, and >> > now that formal testing has begun, I have testers asking me how >> to get a >> > total count of records in Accumulo for verification purposes >> after test >> > ingests have been run. >> > >> > In our case when I say "records" I mean the number of distinct >> rowkeys, not >> > the total number of entries. >> > >> > Is there any way to do this using just the Accumulo shell, maybe >> by writing >> > an aggregator or other class that can be run from within the >> Accumulo shell? >> > >> > Many thanks in advance, >> > Terry >> > >> > >> > On Tue, Jan 22, 2013 at 6:03 PM, Terry P. > > wrote: >> >> >> >> Greetings everyone, >> >> I want to simply get the total count of rows in a table using >> the accumulo >> >> shell. I'm very new to Accumulo so I apologize if it's a >> newbie question. >> >> >> >> I'm prototyping with the accumulo shell, and love how it can ingest >> >> records using exefile, so I've used python to generate a lot of >> test data. >> >> For some test cases in this sprint I need to verify the rows >> loaded match >> >> what's expected, hence the reason I need to get the total rows >> in a table. >> >> >> >> I'd bet there is some way to use setiter or setscaniter with >> the -agg >> >> option, but I can't figure it out. >> >> >> >> Any help would be greatly appreciated. >> >> >> >> Best regards, >> >> Terry >> > >> > >> >> >