hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daniel Vimont (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HBASE-17257) Add column-aliasing capability to hbase-client
Date Thu, 15 Dec 2016 05:24:59 GMT

    [ https://issues.apache.org/jira/browse/HBASE-17257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15728144#comment-15728144
] 

Daniel Vimont edited comment on HBASE-17257 at 12/15/16 5:24 AM:
-----------------------------------------------------------------

Here are some specifications of what I’ve currently designed and coded for column-aliasing
(and will soon be submitting as a patch)...


*COLUMN-ALIASING FOR THE END-USER*:
>From the end-user perspective, column-aliasing entails the following two things...


(1) _Environmental configuration to enable aliasing_:
Aliasing makes use of the already-existing {{hbase.client.connection.impl}} configuration
parameter. The following entry should be added to {{hbase-site.xml}}:
{code}
  <property>
    <name>hbase.client.connection.impl</name>
    <value>org.apache.hadoop.hbase.client.AliasEnabledConnection</value>
  </property>
{code}
Setting this parameter to this value results in {{ConnectionFactory#createConnection}} returning
Connections of the new {{AliasEnabledConnection}} class (subclass of the {{ConnectionImplementation}}
class).


(2) _Alias-enabling individual column families_:
Aliasing is enabled at the column-family level. When adding a column-descriptor (i.e. family)
to a Table, the new method {{HColumnDescriptor#setAliasSize}} may be invoked to immutably
set the fixed size (in bytes) of column-qualifier aliases for the column family. The default
value of 0 (aliasing disabled) may be changed to either 1, 2, or 4.


Other than the above, the end-user-application code should neither require nor contain any
“awareness” whatsoever that column-aliasing is being utilized for a column-family. An
end-user-application continues to interact only with the standard interfaces of the client
API ({{Connection}}, {{Table}}, {{BufferedMutator}}, and {{HTableMultiplexer}}).


*COLUMN-ALIASING INTERNALS*:
One of the overriding goals in designing the column-aliasing infrastructure is to minimize
alterations and insertions into already-existing hbase-client code, and to have very-close-to-zero
impact on already-existing functionality, particularly in those situations in which aliasing
will NOT be used. The following is a comprehensive list of all new and modified modules, along
with an explanation as to the role that the new or modified module plays in aliasing.


_Modified classes_:
*HColumnDescriptor*: as described above, the new method {{#setAliasSize}} has been added;
also corresponding methods, {{#getAliasSize}} and {{#isAliasEnabled}} (returns “true”
if aliasSize not zero).
*HTableDescriptor*: new method {{#hasAliasEnabledFamily}} has been added (returns “true”
if one or more of the table’s families are aliasEnabled).
(Corresponding modifications were also made to {{TestHColumnDescriptor}} and {{TestHTableDescriptor}},
to test the new methods appropriately.)


_New class_:
*AliasEnabledConnection* (subclass of {{ConnectionImplementation}}): provides overrides of
{{#getTable}} and {{#getBufferedMutator}} to return objects of the {{AliasEnabledTable}} class
and the {{AliasEnabledBufferedMutator}} class, respectively.


_Modified class_:
*HTableMultiplexer*: new static method added -- {{#getAliasEnabledTableMultiplexer}}, returns
an {{HTableMultiplexer}} object that is actually an instance of the new subclass, {{AliasEnabledTableMultiplexer}}.


_New classes_:
*AliasEnabledTable* (subclass of {{HTable}})
*AliasEnabledBufferedMutator* (subclass of {{BufferedMutatorImpl}})
*AliasEnabledTableMultiplexer* (subclass of {{HTableMultiplexer}})
-- all the above contain overrides which allow for {{AliasManager}} methods to be invoked
when needed -- to perform qualifier-to-alias conversions (for {{Get}}, {{Scan}}, and {{Mutation}}
objects), and alias-to-qualifier conversions (for {{Result}} objects) -- for any Table for
which {{HTableDescriptor#hasAliasEnabledFamily}} is “true”.


_New class_:
*AliasManager*: performs all alias-oriented conversions -- qualifier-to-alias conversions
for queries and  mutations, and alias-to-qualifier conversions for results. It fully encapsulates
all CRUD transactions against the {{aliasMappingTable}} (the HBase table in which qualifier-to-alias
mappings are persisted for each alias-enabled column family). When a {{Mutation}} object contains
a column-qualifier for which an alias entry does not yet exist, a new alias is generated and
stored in a qualifier-to-alias mapping entry in the {{aliasMappingTable}}. The first time
an {{AliasManager}} is instantiated against an HBase cluster, the {{aliasMappingTable}} will
be created if it does not already exist.


_New reserved HBase table_:
*aliasMappingTable*: Each row on the {{aliasMappingTable}} corresponds to an aliasEnabled
column-family on a user-table. The rowId of each {{aliasMappingTable}} row is in the format:
{{[fully-qualified-user-table-name + ":" + aliasEnabled-Family]}}. On each {{aliasMappingTable}}
row, the column with an EMPTY_BYTE_ARRAY column-qualifier is reserved for an Increment value
used to generate new unique alias values within the range stipulated by the aliasSize (1,
2, or 4 bytes) of the column-family. All other columns on an {{aliasMappingTable}} row are
key:value pairings which map a user-column-qualifier to its corresponding alias.


_Modified interface and class_:
*Admin*
*HBaseAdmin*
-- new method added, {{#deleteColumnFamilyAliases}}. While usage is not mandatory, this method
may be invoked to remove from the {{aliasMappingTable}} the row associated with a specific
column-family. It may only be successfully invoked after the column-family has been fully
deleted from its table, or after the table itself has been deleted.


_Modified class_:
*Get*: new package-protected method {{#setFamilyMap}} -- added to make {{Get}} consistent
with {{Scan}}, {{Mutation}}, etc. (which already have such a method); this was required to
allow {{AliasManager}} to cleanly produce alias-converted {{Get}} objects.


_Modified class_:
*ConnectionImplementation*: Inadvertent usage of standard {{HTable}} and {{BufferedMutatorImpl}}
objects against a Table with alias-enabled column-families would result in corruption of the
Table’s data. To prevent this, the methods {{#getTable}} and {{#getBufferedMutator}} were
modified with the addition of a call to the static method {{AliasManager#verifyConnectionForAliasEnabledTable}},
which throws an {{IllegalStateException}} if a Table is alias-enabled and the {{AliasEnabledConnection.class}}
is not assignable from the class of the current {{Connection}}.


_Modified classes_:
*ConnectionImplementation*
*HTable*
-- the method {{ConnectionImplementation#getBufferedMutator}} was refactored into two separate
methods, with the original {{#getBufferedMutator}} method now calling a new package-protected
method called {{#getBufferedMutatorImpl}}. The HTable class internally uses a {{BufferedMutatorImpl}}
object to accomplish some of its processing, and its invocation of {{ConnectionImplementation#getBufferedMutator}}
needed to be changed to the new {{#getBufferedMutatorImpl}} method in order to assure proper
functioning of both {{HTable}} and its new subclass, {{AliasEnabledTable}}. (Without this
change, {{AliasEnabledTable}} would incorrectly instantiate an internal {{AliasEnabledBufferedMutator}}
instead of the required standard {{BufferedMutatorImpl}}.)


_Modified class_:
*TableName*: constant added for {{ALIAS_TABLE_NAME}}.


_Added TEST classes_:
-- Test classes were added (in the hbase-server subproject, which allows access to the {{HBaseTestingUtility}})
for all the new Alias* prefixed classes.
The most elaborate testing takes place in the {{TestAliasEnabledTable}} module, in which identical
sets of mutations and queries are submitted against both a standard (non-alias-enabled, “baseline”)
table and four other tables defined with various combinations of alias-enabled column-families.
Results from all alias-enabled families are exhaustively compared with the “baseline”
results to assure that all are completely identical.


was (Author: daniel_vimont):
Here are some specifications of what I’ve currently designed and coded for column-aliasing
(and will soon be submitting as a patch)...


*COLUMN-ALIASING FOR THE END-USER*:
>From the end-user perspective, column-aliasing entails the following two things...


(1) _Environmental configuration to enable aliasing_:
Aliasing makes use of the already-existing {{hbase.client.connection.impl}} configuration
parameter. The following entry should be added to {{hbase-site.xml}}:
{code}
  <property>
    <name>hbase.client.connection.impl</name>
    <value>org.apache.hadoop.hbase.client.AliasEnabledConnection</value>
  </property>
{code}
Setting this parameter to this value results in {{ConnectionFactory#createConnection}} returning
Connections of the new {{AliasEnabledConnection}} class (subclass of the {{ConnectionImplementation}}
class).


(2) _Alias-enabling individual column families_:
Aliasing is enabled at the column-family level. When adding a column-descriptor (i.e. family)
to a Table, the new method {{HColumnDescriptor#setAliasSize}} may be invoked to immutably
set the fixed size (in bytes) of column-qualifier aliases for the column family. The default
value of 0 (aliasing disabled) may be changed to either 1, 2, or 4.


Other than the above, the end-user-application code should neither require nor contain any
“awareness” whatsoever that column-aliasing is being utilized for a column-family. An
end-user-application continues to interact only with the standard interfaces of the client
API ({{Connection}}, {{Table}}, {{BufferedMutator}}, and {{HTableMultiplexer}}).


*COLUMN-ALIASING INTERNALS*:
One of the overriding goals in designing the column-aliasing infrastructure is to minimize
alterations and insertions into already-existing hbase-client code, and to have very-close-to-zero
impact on already-existing functionality, particularly in those situations in which aliasing
will NOT be used. The following is a comprehensive list of all new and modified modules, along
with an explanation as to the role that the new or modified module plays in aliasing.


_Modified classes_:
*HColumnDescriptor*: as described above, the new method {{#setAliasSize}} has been added;
also corresponding methods, {{#getAliasSize}} and {{#isAliasEnabled}} (returns “true”
if aliasSize not zero).
*HTableDescriptor*: new method {{#hasAliasEnabledFamily}} has been added (returns “true”
if one or more of the table’s families are aliasEnabled).
(Corresponding modifications were also made to {{TestHColumnDescriptor}} and {{TestHTableDescriptor}},
to test the new methods appropriately.)


_New class_:
*AliasEnabledConnection* (subclass of {{ConnectionImplementation}}): provides overrides of
{{#getTable}} and {{#getBufferedMutator}} to return objects of the {{AliasEnabledTable}} class
and the {{AliasEnabledBufferedMutator}} class, respectively.


_Modified class_:
*HTableMultiplexer*: new static method added -- {{#getAliasEnabledTableMultiplexer}}, returns
an {{HTableMultiplexer}} object that is actually an instance of the new subclass, {{AliasEnabledTableMultiplexer}}.


_New classes_:
*AliasEnabledTable* (subclass of {{HTable}})
*AliasEnabledBufferedMutator* (subclass of {{BufferedMutatorImpl}})
*AliasEnabledTableMultiplexer* (subclass of {{HTableMultiplexer}})
-- all the above contain overrides which allow for {{AliasManager}} methods to be invoked
when needed -- to perform qualifier-to-alias conversions (for {{Get}}, {{Scan}}, and {{Mutation}}
objects), and alias-to-qualifier conversions (for {{Result}} objects) -- for any Table for
which {{HTableDescriptor#hasAliasEnabledFamily}} is “true”.


_New class_:
*AliasManager*: performs all alias-oriented conversions -- qualifier-to-alias conversions
for queries and  mutations, and alias-to-qualifier conversions for results. It fully encapsulates
all CRUD transactions against the {{aliasMappingTable}} (the HBase table in which qualifier-to-alias
mappings are persisted for each alias-enabled column family). When a {{Mutation}} object contains
a column-qualifier for which an alias entry does not yet exist, a new alias is generated and
stored in a qualifier-to-alias mapping entry in the {{aliasMappingTable}}. The first time
an {{AliasManager}} is instantiated against an HBase cluster, the {{aliasMappingTable}} will
be created if it does not already exist.


_New reserved HBase table_:
*aliasMappingTable*: Each row on the {{aliasMappingTable}} corresponds to an aliasEnabled
column-family on a user-table. The rowId of each {{aliasMappingTable}} row is in the format:
{{[fully-qualified-user-table-name + ":" + aliasEnabled-Family]}}. On each {{aliasMappingTable}}
row, the column with an EMPTY_BYTE_ARRAY column-qualifier is reserved for an Increment value
used to generate new unique alias values within the range stipulated by the aliasSize (1,
2, or 4 bytes) of the column-family. All other columns on an {{aliasMappingTable}} row are
key:value pairings which map a user-column-qualifier to its corresponding alias.


_Modified interface and class_:
*Admin*
*HBaseAdmin*
-- new method added, {{#deleteColumnFamilyAliases}}. While usage is not mandatory, this method
may be invoked to remove from the {{aliasMappingTable}} the row associated with a specific
column-family. It may only be successfully invoked after the column-family has been fully
deleted from its table, or after the table itself has been deleted.


_Modified class_:
*Get*: new package-protected method {{#setFamilyMap}} -- added to make {{Get}} consistent
with {{Scan}}, {{Mutation}}, etc. (which already have such a method); this was required to
allow {{AliasManager}} to cleanly produce alias-converted {{Get}} objects.


_Modified class_:
*ConnectionImplementation*: Inadvertent usage of standard {{HTable}} and {{BufferedMutatorImpl}}
objects against a Table with alias-enabled column-families would result in corruption of the
Table’s data. To prevent this, the methods {{#getTable}} and {{#getBufferedMutator}} were
modified with the addition of a call to the static method {{AliasManager#verifyConnectionForAliasEnabledTable}},
which throws an {{IllegalStateException}} if a Table is alias-enabled and the {{AliasEnabledConnection.class}}
is not assignable from the class of the current {{Connection}}.


_Modified classes_:
*ConnectionImplementation*
*HTable*
-- the method {{ConnectionImplementation#getBufferedMutator}} was refactored into two separate
methods, with the original {{#getBufferedMutator}} method now calling a new package-protected
method called {{#getBufferedMutatorImpl}}. The HTable class internally uses a {{BufferedMutatorImpl}}
object to accomplish some of its processing, and its invocation of {{ConnectionImplementation#getBufferedMutator}}
needed to be changed to the new {{#getBufferedMutatorImpl}} method in order to assure proper
functioning of both {{HTable}} and its new subclass, {{AliasEnabledTable}}. (Without this
change, {{AliasEnabledTable}} would incorrectly instantiate an internal {{AliasEnabledBufferedMutator}}
instead of the required standard {{BufferedMutatorImpl}}.)


_Modified classes_:
*TableName*: constant added for {{ALIAS_TABLE_NAME}}.


_Added TEST classes_:
-- Test classes were added (in the hbase-server subproject, which allows access to the {{HBaseTestingUtility}})
for all the new Alias* prefixed classes.
The most elaborate testing takes place in the {{TestAliasEnabledTable}} module, in which identical
sets of mutations and queries are submitted against both a standard (non-alias-enabled, “baseline”)
table and four other tables defined with various combinations of alias-enabled column-families.
Results from all alias-enabled families are exhaustively compared with the “baseline”
results to assure that all are completely identical.

> Add column-aliasing capability to hbase-client
> ----------------------------------------------
>
>                 Key: HBASE-17257
>                 URL: https://issues.apache.org/jira/browse/HBASE-17257
>             Project: HBase
>          Issue Type: New Feature
>          Components: Client
>    Affects Versions: 2.0.0
>            Reporter: Daniel Vimont
>            Assignee: Daniel Vimont
>              Labels: features
>         Attachments: HBASE-17257-v2.patch, HBASE-17257-v3.patch, HBASE-17257.patch
>
>
> Review Board link: https://reviews.apache.org/r/54635/
> Column aliasing will provide the option for a 1, 2, or 4 byte alias value to be stored
in each cell of an "alias enabled" column-family, in place of the full-length column-qualifier.
Aliasing is intended to operate completely invisibly to the end-user developer, with absolutely
no "awareness" of aliasing required to be coded into a front-end application. No new public
hbase-client interfaces are to be introduced, and only a few new public methods should need
to be added to existing interfaces, primarily to allow an administrator to designate that
a new column-family is to be alias-enabled by setting its aliasSize attribute to 1, 2, or
4.
> To facilitate such functionality, new subclasses of HTable, BufferedMutatorImpl, and
HTableMultiplexer are to be provided. The overriding methods of these new subclasses will
invoke methods of the new AliasManager class to facilitate qualifier-to-alias conversions
(for user-submitted Gets, Scans, and Mutations) and alias-to-qualifier conversions (for Results
returned from HBase) for any Table that has one or more alias-enabled column families. All
conversion logic will be encapsulated in the new AliasManager class, and all qualifier-to-alias
mappings will be persisted in a new aliasMappingTable in a new, reserved namespace.
> An informal polling of HBase users at HBaseCon East and at the Strata/Hadoop-World conference
in Sept. 2016 showed that Column Aliasing could be a popular enhancement to standard HBase
functionality, due to the fact that full column-qualifiers are stored in each cell, and reducing
this qualifier storage requirement down to 1, 2, or 4 bytes per cell could prove beneficial
in terms of reduced storage and bandwidth needs. Aliasing is intended chiefly for column-families
which are of the "narrow and tall" variety (i.e., that are designed to use relatively few
distinct column-qualifiers throughout a large number of rows, throughout the lifespan of the
column-family). A column-family that is set up with an alias-size of 1 byte can contain up
to 255 unique column-qualifiers; a 2 byte alias-size allows for up to 65,535 unique column-qualifiers;
and a 4 byte alias-size allows for up to 4,294,967,295 unique column-qualifiers.
> Fuller specifications will be entered into the comments section below. Note that it may
well not be viable to add aliasing support in the new "async" classes that appear to be currently
under development.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message