accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Corey J. Nolet (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (ACCUMULO-391) Multi-table Accumulo input format
Date Wed, 25 Sep 2013 16:04:09 GMT

    [ https://issues.apache.org/jira/browse/ACCUMULO-391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13777678#comment-13777678
] 

Corey J. Nolet edited comment on ACCUMULO-391 at 9/25/13 4:04 PM:
------------------------------------------------------------------

I don't like enforcing the user to follow a specific configuration order. I know they'll only
need to configure it once but that's a tedious trial and error process until they either pull
down the codebase or figure out the right order in which to call the methods. Perhaps a nice
warning during getInputSplits() or initialize() in the mappers would be enough for someone
to see in the logs why their stuff failed (or an exception). I agree with William- they'll
only need to do this once in most cases.

On the other topic- the iterators, ranges, and columns are inherently tied to a table. In
the case of a single table input format, I can see why separate methods could be used. I like
the idea of having a TableConfiguration object that has the iterators, ranges, and columns
serialized within it. It would simplify the API immensely as well as the concerns that each
configuration is in a valid state by the time the getInputSplits() method is called. Perhaps
this could also be used in the MultiTableBatchScanner implementation.

That's a significant API change to introduce in 1.6.0. We could get away with backwards compatibility
by having the current set table methods (setting a single table) hydrate a TableConfiguration
object under the hood that could be treated as a "default table".
                
      was (Author: sonixbp):
    I don't like enforcing the user to follow a specific configuration order. I know they'll
only need to configure it once but that's a tedious trial and error process until they either
pull down the codebase or figure out the right order in which to call the methods. Perhaps
a nice warning during getInputSplits() or initialize() in the mappers would be enough for
someone to see in the logs why their stuff failed. I agree with William- they'll only need
to do this once in most cases.

On the other topic- the iterators, ranges, and columns are inherently tied to a table. In
the case of a single table input format, I can see why separate methods could be used. I like
the idea of having a TableConfiguration object that has the iterators, ranges, and columns
serialized within it. It would simplify the API immensely as well as the concerns that each
configuration is in a valid state by the time the getInputSplits() method is called. Perhaps
this could also be used in the MultiTableBatchScanner implementation.

That's a significant API change to introduce in 1.6.0. We could get away with backwards compatibility
by having the current set table methods (setting a single table) hydrate a TableConfiguration
object under the hood that could be treated as a "default table".
                  
> Multi-table Accumulo input format
> ---------------------------------
>
>                 Key: ACCUMULO-391
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-391
>             Project: Accumulo
>          Issue Type: New Feature
>            Reporter: John Vines
>            Assignee: Corey J. Nolet
>            Priority: Minor
>              Labels: mapreduce,
>             Fix For: 1.6.0
>
>         Attachments: ACCUMULO-391.patch, multi-table-if.patch, new-multitable-if.patch
>
>
> Just realized we had no MR input method which supports multiple Tables for an input format.
I would see it making the table the mapper's key and making the Key/Value a tuple, or alternatively
have the Table/Key be the key tuple and stick with Values being the value.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message