accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Christopher Tubbs (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ACCUMULO-164) Add support for wildcards/regexes in locality group setting.
Date Thu, 07 Feb 2013 01:29:12 GMT

    [ https://issues.apache.org/jira/browse/ACCUMULO-164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13573081#comment-13573081
] 

Christopher Tubbs commented on ACCUMULO-164:
--------------------------------------------

I'm not necessarily against the feature, but I would like to expand on the opposition [~elserj]
[mentioned|#comment-13570560] above, under the "*Against*" section (apologies in advance for
its lack of brevity):

The objection is that the basic role of column families is to create logical groups of columns
with increased locality within a row. To me, the most intuitive application of column families
as logical groups is to have a discrete set of them. It seems to me that in applications that
have continuous variability in the column family, they could just as easily have this variability
in the column qualifier. Indeed, it seems to me that is what the column qualifier should be
used for: variability needed uniquely identify a value within a discrete logical grouping
provided by the column family.

If the column family is not used this way, then it seems to me that the column family just
gets reduced to "column element 1 that sorts after row" and the column qualifier gets reduced
to "column element 2 that sorts after column element 1". While I realize some applications
may already be using these elements of the key in this way, don't need discrete column families,
and simply find it convenient to break up their columns into two pieces for whatever reason,
I think that these applications are breaking the basic data model provided by the API corresponding
to an Accumulo table (which is already pretty basic to begin with).

While it's fine for these applications to break the basic data model implied by the structured
key (reducing it to "sorted key dimension 1", "sorted key dimension 2", "sorted key dimension
3", etc.), I think that when they do, they make it that much harder to express, with a common
language, their particular table schemas (when a row doesn't mean row in any traditional database
sense at all, when a family doesn't mean a collection of related items, when a qualifier doesn't
mean uniqueness, when a value doesn't actually get used to hold the contents of a cell identified
by the key).

I personally think that this increase in difficulty to express the intentions and uses of
any particular element of the structured key, when these intentions become nothing more than
nominative, raises the barrier to entry and makes the API more confusing.

All that said, I think the proposed feature encourages table schemas to break the basic data
model of discrete logical groupings of related columns in a row, and I think that existing
schemas that rely on variability in the column family could nearly as easily rely on that
variability in the qualifier. I also think the use of discrete column families is more easily
expressed in documentation, in the API, in examples, and reduces the complexity of table schemas
overall.

However, I also understand that it may be very convenient to have this in many applications
(particularly those existing applications that don't want to redefine working table schemas
to take advantage of locality groups), so I'm not necessarily against the feature. I would
just like to see it, and other instances of the basic data model implied by the structured
key being broken, as an "expert" feature, and not the norm.
                
> Add support for wildcards/regexes in locality group setting.
> ------------------------------------------------------------
>
>                 Key: ACCUMULO-164
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-164
>             Project: Accumulo
>          Issue Type: Improvement
>          Components: client, master, tserver
>            Reporter: John Vines
>             Fix For: 1.6.0
>
>
> We should look into adding the ability to specify locality group columns as either wildcarding
or regexes. I'm unsure of the feasibility of this, hence the lack of fix date.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message