accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Christopher Tubbs (JIRA)" <>
Subject [jira] [Commented] (ACCUMULO-391) Multi-table Accumulo input format
Date Mon, 07 Oct 2013 22:19:42 GMT


Christopher Tubbs commented on ACCUMULO-391:

A few issues:

* shouldUseLocalIterators was added as a deprecated, public method
** it didn't exist in a prior version, so it shouldn't be deprecated. If it's not needed,
it should be removed.
** our internal code still uses it, though it's deprecated. We shouldn't use our own deprecated

* getters changed without deprecation
** setupIterators
** getTabletLocator

* TableQueryConfig was placed in o.a.a.core.conf
** o.a.a.core.conf isn't really part of the public API; it's essentially for server-side configuration
representation, though we use it internal to some client code
** precedent for narrowly scoped config is in the package in which its corresponding code
exists (see o.a.a.core.client.BatchWriterConfig)

* Javadoc
** Empty @return statements
** Javadocs advise using deprecated methods
** Unnecessary change of variable name "context" in getSplits to "conf", with incorrect description
based on new name rather than the object type
** Incorrect Javadoc description on setTableQueryConfigs. It is not setting the objects on
a Hadoop configuration. It is setting the configuration on the job. This is a minor thing,
but the additional precision in language goes a long way towards clear documentation, especially
for non-native English speakers and people less familiar with the way MapReduce works in Hadoop.

* TableQueryConfig
** For consistency and clarity, this should be named to match our other query code. Perhaps
instead of "TableQueryConfig", it might be better to call it BatchScanConfig, similar to BatchWriterConfig.
** Instead of the ambiguous setTableQueryConfigs, perhaps a method called Map<String, BatchScanConfig>,
to make this object more re-usable.
** getter should be protected

Overall, given the significance of the changes to the API of the MapReduce code, for a limited
application (most existing users of this class will probably continue to only scan one table
at a time), I think it'd be better to put this code in a separate InputFormat class. This
should be especially concerning because we've recently just stabilized the M/R code in 1.5.0,
ironing out existing issues, and this is a bit too disruptive (deprecating brand new methods
in 1.5, like setTableName and setRanges, for instance).

Another reason this might be good as a separate class... is that we could actually have the
existing single table version extend the multi-table version as a specialized case. That would
save on maintenance costs of two implementations, but leave the existing stable and familiar
API in tact until the new one is proven and stable. And then (if we really wanted to) we could
deprecate the single table version as a whole, rather than deprecating half of it, and trying
to maintain a half-deprecated half-current class.

> Multi-table Accumulo input format
> ---------------------------------
>                 Key: ACCUMULO-391
>                 URL:
>             Project: Accumulo
>          Issue Type: New Feature
>            Reporter: John Vines
>            Assignee: Corey J. Nolet
>            Priority: Minor
>              Labels: mapreduce,
>             Fix For: 1.6.0
>         Attachments: ACCUMULO-391.patch, multi-table-if.patch, new-multitable-if.patch
> Just realized we had no MR input method which supports multiple Tables for an input format.
I would see it making the table the mapper's key and making the Key/Value a tuple, or alternatively
have the Table/Key be the key tuple and stick with Values being the value.

This message was sent by Atlassian JIRA

View raw message