accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Christopher Tubbs (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ACCUMULO-164) Add support for wildcards/regexes in locality group setting.
Date Thu, 07 Feb 2013 17:27:13 GMT

    [ https://issues.apache.org/jira/browse/ACCUMULO-164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13573687#comment-13573687
] 

Christopher Tubbs commented on ACCUMULO-164:
--------------------------------------------

{quote}
The feature in its simplest sense does conflict with the original idea behind locality groups,
but is that always "bad"? I'm not sure, but it's definitely different.
{quote}
We've already extended the original idea behind locality groups, by allowing users to specify
more than one column family for a locality group. And, I think that is definitely not "bad"
("good", even). This is just an easier way to select multiple families to put in a locality
group, based on a common characteristic (like common prefix).

However, I question why something like "common prefix" should be a desirable selection mechanism
for multiple families in the first place. Not only are (in the case of the common prefix)
these data naturally grouped locally without any use of locality groups, it's not clear to
me that something like "common prefix" is the most sensible way to group related families
in the general case. I'm not sure there *is* a general case, though. Perhaps len < 4 is
more useful than identifying a common prefix for some users? Further, the only application
for this, that I can think of, is when users introduce variability into the family that allows
the number of distinct families to grow continuously (which, I think can be, and should be,
done in the qualifier instead). So, I personally see little benefit to it (at least, for the
common prefix case; though full regexes or suffixes would certainly have greater benefit).

Maybe the most useful, and general, thing we could do to provide users the most utility to
select families for a locality group, is to allow users to inject a user-defined hash function
(maybe in JEXL?) to bin families into discrete localities by the arbitrary method of their
choosing?

{quote}
Do you have any ideas on how to present such a feature that would avoid steering the common
user toward it? Is healthy warning/documentation sufficient?
{quote}
If implemented, I think documentation should be sufficient to address all of my concerns.
The main thing is just make it clear that the feature is used to *select* multiple column
families, so that it's not implied that families with variability *are* the same "family".
The API treats non-equal families as distinct, and that's how we should discuss them.
                
> Add support for wildcards/regexes in locality group setting.
> ------------------------------------------------------------
>
>                 Key: ACCUMULO-164
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-164
>             Project: Accumulo
>          Issue Type: Improvement
>          Components: client, master, tserver
>            Reporter: John Vines
>             Fix For: 1.6.0
>
>
> We should look into adding the ability to specify locality group columns as either wildcarding
or regexes. I'm unsure of the feasibility of this, hence the lack of fix date.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message