accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Christopher Tubbs (JIRA)" <>
Subject [jira] [Commented] (ACCUMULO-1228) Allow clients to disable column families and locality groups
Date Thu, 04 Apr 2013 17:05:16 GMT


Christopher Tubbs commented on ACCUMULO-1228:

{quote}I still don't agree that there's an issue for us to be concerned over with client code
breaking after someone *knowingly* redefines a locality group.{quote}

You have to consider who is doing this modification. This type of modification seems to me
to most likely occur when you have different data types occupying the same table, and query
patterns change over time, and an admin changes locality configuration to improve performance
for those types of queries. The most obvious such transition is from no locality groups (everything
in the "default" locality group) to 2 locality groups (1 named, 1 default). Tailoring configuration
to optimize for different query loads over time seems like a system administrative function
to me, and I don't think that performing this action should break everybody's code that has
been written with the assumption that the locality groups are static, nor do I see a substantial
reason to encourage such assumptions by exposing that behavior in the public API.

Locality groups do *not* represent the data model and should not be queried as such. They
represent query patterns, and an optimization for changing load requirements.

Now, it is a trivial matter to make the additional assumption that the query load is congruent
with the data model for users who know and accept all the risks and implications of doing
so... but that case can easily be satisfied by storing some concept of a ColumnFamilyGroup
in the application code, and using that as a source for both the maintenance functions (configuring
locality groups) *and* the query functions (constructing scanners).

Another trivial option to satisfy the same use case is a very simple utility class that reads
the current locality groups, and overloads their purpose to also represent a set of column
families over which to query. Such a utility comes with risks, though, and it could be confusing
to avail such a thing in the public API. I think this sort of thing is best left in an obscure
package with lots of javadoc admonishments, or in user/application code.

{quote}Scan profiles sound like they could be really useful.{quote}
+10, and we're halfway there with IteratorSettings, and iterator profiles in the shell.

Another observation:
I think the mapreduce API for setting the Set of columns over which to query might actually
be more intuitive than our main appending "fetchColumn" and "fetchColumnFamily" methods in
Scanner. Allowing a fetch method that gets a set/group of columns/families might would make
it even more trivial to have a one-liner that reuses locality group info (or other column
family group info) to configure a scanner.
> Allow clients to disable column families and locality groups
> ------------------------------------------------------------
>                 Key: ACCUMULO-1228
>                 URL:
>             Project: Accumulo
>          Issue Type: New Feature
>          Components: client, tserver
>    Affects Versions: 1.5.0
>            Reporter: William Slacum
>            Priority: Minor
>             Fix For: 1.6.0
> There's an inconsistency between what a server is capable of and what a client can tell
it to do with respect to fetching column families.
> Currently, a user can tell a {{Scanner}} to fetch some set of column families. The iterators
support not only this, but also the converse where a user does not want to retrieve column
families. An iterator implementation can do this by hand, but a client cannot specifically
tell a Scanner to not return data from a set of column families. Clients should be able to
specify this option.
> There also seems to be an inconsistency with how locality groups are defined and then
utilized. If I want to specify a set of column families as being part of a locality group,
I have to provide a mapping of locality group name to a list of column families. If I want
to fetch a locality group, I have to get the mapping first, rather than just set which locality
group I want to use. It'd be more convenient to tell the scanner just to fetch which locality
groups I want, and have the server know which column families that means.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

View raw message