Return-Path: X-Original-To: apmail-accumulo-notifications-archive@minotaur.apache.org Delivered-To: apmail-accumulo-notifications-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 6C219F998 for ; Thu, 4 Apr 2013 00:49:16 +0000 (UTC) Received: (qmail 51574 invoked by uid 500); 4 Apr 2013 00:49:16 -0000 Delivered-To: apmail-accumulo-notifications-archive@accumulo.apache.org Received: (qmail 51512 invoked by uid 500); 4 Apr 2013 00:49:16 -0000 Mailing-List: contact notifications-help@accumulo.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: jira@apache.org Delivered-To: mailing list notifications@accumulo.apache.org Received: (qmail 51365 invoked by uid 99); 4 Apr 2013 00:49:15 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 04 Apr 2013 00:49:15 +0000 Date: Thu, 4 Apr 2013 00:49:15 +0000 (UTC) From: "Josh Elser (JIRA)" To: notifications@accumulo.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (ACCUMULO-1228) Allow clients to disable column families and locality groups MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/ACCUMULO-1228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13621597#comment-13621597 ] Josh Elser commented on ACCUMULO-1228: -------------------------------------- It would be super useful to have the column family fetch/restrict utility through the client api. This can very positively impact the performance of a query, especially if you can exclude all columns which you know you don't want to look into. bq. Locality groups are an admin feature, for optimizing the data storage for more efficient queries Why do you call them an admin feature? In the simplest form, it's nothing more than a group of columns that we "set aside". Pulling out my BigTable paper, they specifically say "Clients can group multiple column families togther...". As such, a locality group is losely defined as some group of columns that are "accessed together", which in my mind, is something a client would be aware of when issuing a query. I suppose I disagree with you comparing them to a codec. True, the client doesn't care about the codec used to compress the data just as the client doesn't care about the distribution of files backing the table(t), but there are implications for a client to understand the layout of their data and clients can use that knowledge to more effectively retrieve it. As for the argument about a locality group changing and potentially not being propagated to disk yet, the locality group is nothing but a convenience shorthand, allowing a client to fetch multiple columns with a single call. Sure, you don't get the optimization, but the API is still much more simple for the user. > Allow clients to disable column families and locality groups > ------------------------------------------------------------ > > Key: ACCUMULO-1228 > URL: https://issues.apache.org/jira/browse/ACCUMULO-1228 > Project: Accumulo > Issue Type: New Feature > Components: client, tserver > Affects Versions: 1.5.0 > Reporter: William Slacum > Priority: Minor > Fix For: 1.6.0 > > > There's an inconsistency between what a server is capable of and what a client can tell it to do with respect to fetching column families. > Currently, a user can tell a {{Scanner}} to fetch some set of column families. The iterators support not only this, but also the converse where a user does not want to retrieve column families. An iterator implementation can do this by hand, but a client cannot specifically tell a Scanner to not return data from a set of column families. Clients should be able to specify this option. > There also seems to be an inconsistency with how locality groups are defined and then utilized. If I want to specify a set of column families as being part of a locality group, I have to provide a mapping of locality group name to a list of column families. If I want to fetch a locality group, I have to get the mapping first, rather than just set which locality group I want to use. It'd be more convenient to tell the scanner just to fetch which locality groups I want, and have the server know which column families that means. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira