accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Corey J. Nolet (JIRA)" <>
Subject [jira] [Commented] (ACCUMULO-391) Multi-table Accumulo input format
Date Wed, 25 Sep 2013 16:04:05 GMT


Corey J. Nolet commented on ACCUMULO-391:

I don't like enforcing the user to follow a specific configuration order. I know they'll only
need to configure it once but that's a tedious trial and error process until they either pull
down the codebase or figure out the right order in which to call the methods. Perhaps a nice
warning during getInputSplits() or initialize() in the mappers would be enough for someone
to see in the logs why their stuff failed. I agree with William- they'll only need to do this
once in most cases.

On the other topic- the iterators, ranges, and columns are inherently tied to a table. In
the case of a single table input format, I can see why separate methods could be used. I like
the idea of having a TableConfiguration object that has the iterators, ranges, and columns
serialized within it. It would simplify the API immensely as well as the concerns that each
configuration is in a valid state by the time the getInputSplits() method is called. Perhaps
this could also be used in the MultiTableBatchScanner implementation.

That's a significant API change to introduce in 1.6.0. We could get away with backwards compatibility
by having the current set table methods (setting a single table) hydrate a TableConfiguration
object under the hood that could be treated as a "default table".
> Multi-table Accumulo input format
> ---------------------------------
>                 Key: ACCUMULO-391
>                 URL:
>             Project: Accumulo
>          Issue Type: New Feature
>            Reporter: John Vines
>            Assignee: Corey J. Nolet
>            Priority: Minor
>              Labels: mapreduce,
>             Fix For: 1.6.0
>         Attachments: ACCUMULO-391.patch, multi-table-if.patch, new-multitable-if.patch
> Just realized we had no MR input method which supports multiple Tables for an input format.
I would see it making the table the mapper's key and making the Key/Value a tuple, or alternatively
have the Table/Key be the key tuple and stick with Values being the value.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

View raw message