Return-Path: X-Original-To: apmail-accumulo-user-archive@www.apache.org Delivered-To: apmail-accumulo-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 67AC717587 for ; Sun, 5 Oct 2014 01:23:01 +0000 (UTC) Received: (qmail 38941 invoked by uid 500); 5 Oct 2014 01:23:01 -0000 Delivered-To: apmail-accumulo-user-archive@accumulo.apache.org Received: (qmail 38898 invoked by uid 500); 5 Oct 2014 01:23:01 -0000 Mailing-List: contact user-help@accumulo.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@accumulo.apache.org Delivered-To: mailing list user@accumulo.apache.org Received: (qmail 38886 invoked by uid 99); 5 Oct 2014 01:23:01 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 05 Oct 2014 01:23:01 +0000 X-ASF-Spam-Status: No, hits=0.3 required=5.0 tests=FREEMAIL_REPLY,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of josh.elser@gmail.com designates 209.85.216.45 as permitted sender) Received: from [209.85.216.45] (HELO mail-qa0-f45.google.com) (209.85.216.45) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 05 Oct 2014 01:22:33 +0000 Received: by mail-qa0-f45.google.com with SMTP id s7so2287397qap.4 for ; Sat, 04 Oct 2014 18:22:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=message-id:date:from:user-agent:mime-version:to:subject:references :in-reply-to:content-type:content-transfer-encoding; bh=16ovlNTieBpyhqTxQbR3lEhwc5yVvX30ShJvsT838QE=; b=XJoFu34sCf18CrFdtKGdyF7YLw4XIjtS2QDXlpB2BP4E5qGPQnF0Wa/okGDGgdJ5Bt tvAYrUSYtKJwS7WxYCUsOhi8y+4Iw21lx9p168JwzHsb9Pl57o+QNceg1m2YSmL71KRO sCZ2xWsK6UbsKRy1YU7pzllivP0ScTvrldR3kaeBXOwd7PlGdIK5mOH7txbhKz/FatsP fjUV1Oktow+XiS4pdDRyut0qdQDeTuhjbvCzQHYeKr2dIo0LhsiQPl2mVbJHsnyiJI5D gnpNPzA7C0TEAW7UD+cl6z43qqyywR4f5TM262AazDwgoGK8bB69C+PzoBZbiZEXMnw7 xNJw== X-Received: by 10.140.39.240 with SMTP id v103mr17023638qgv.23.1412472152574; Sat, 04 Oct 2014 18:22:32 -0700 (PDT) Received: from [192.168.2.38] (pool-71-166-48-47.bltmmd.fios.verizon.net. [71.166.48.47]) by mx.google.com with ESMTPSA id b1sm9357991qat.9.2014.10.04.18.22.31 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Sat, 04 Oct 2014 18:22:32 -0700 (PDT) Message-ID: <54309DA0.5090008@gmail.com> Date: Sat, 04 Oct 2014 21:23:44 -0400 From: Josh Elser User-Agent: Postbox 3.0.11 (Windows/20140602) MIME-Version: 1.0 To: user@accumulo.apache.org Subject: Re: Determining tablets assigned to table splits, and the number of rows in each tablet References: <54300548.2070708@gmail.com> In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org I'll re-state it: I'd be happy to work with you to figure out some Java APIs for clients to consume for these kinds of metrics. A JIRA issue is the best way to encapsulate this. Would also love to help you provide a patch for it, too :) The biggest concern (at least for creating an API for entries in a table -- by tablet/tabletserver/otherwise) is going to be that the number of entries is an approximation, not definitive. This is not prohibitive, though, as long as we're clear that it is an approximation and not an exact metric. Dylan Hutchison wrote: > It should suffice to list the number of entries for a table, tablet > and tablet server. No need to worry about number of unique rows, > number of unique column families, etc. By entry I mean number of > (key,value)s. > > For load balancing, we care about how much physical data is on each > tablet / tablet server. This is directly proportional to the number > of entries, assuming that the key size and value size in b ytes do not > differ too drastically. If they do (say for raw documents of vastly > different sizes), the best measure is the /size of the data in bytes > /for each tablet / tablet server. I didn't suggest it because it > doesn't look like Accumulo tracks it so it would involve a lot of new > implementation and book-keeping, which could hamper performance. > > Accumulo does already track the number of entries for tables, tablets > and tablet server. It's just hard to get to, relying on the format of > the metadata table and accessing the non-public Monitor classes. > Bringing it to the public API just looks like a matter of reworking > the API and letting the client gather the information that the Monitor > already does by connecting to each tablet server. Does that sound > reasonable? > > Regards, Dylan > > On Sat, Oct 4, 2014 at 4:11 PM, David Medinets > > wrote: > > Adding this functionality in to Accumulo's API would reduce it's > efficiency for users that don't need this level of tracking. Let > ingest procedures take the performance hit. There are > synchronization issues that reduce degrade performance. Also what > would be the appropriate level of tracking - at the row, > column-family, or every level? Whatever answer you give, someone > else will ask for something different. And then there are the > aggregation questions. Not to mention the additional storage > requirements. > > > > -- > www.cs.stevens.edu/~dhutchis