accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Pradeep Gollakota (JIRA)" <>
Subject [jira] [Commented] (ACCUMULO-391) Multi-table Accumulo input format
Date Wed, 05 Jun 2013 15:42:20 GMT


Pradeep Gollakota commented on ACCUMULO-391:

This would be a great addition.

We have just started working with Pig (with Accumulo) at my company. The first thing that
we noticed is that in a lot of situations, where we are joining data from one Accumulo table
to data from another, we have to first dump the data from both tables to HDFS (perhaps using
PigStorage), load the data back and then join the data. This was because the scan information
is encoded in the job configuration. So, when Pig uses the MultiInputFormat to scan both tables
in the same job, only one table ends up getting exported from Accumulo.

If this is completed, we could use the MultiTableInputFormat instead of Accumulo(Row)InputFormat
to optimize our pig scripts.

Any thoughts on when this would be included?
> Multi-table Accumulo input format
> ---------------------------------
>                 Key: ACCUMULO-391
>                 URL:
>             Project: Accumulo
>          Issue Type: New Feature
>            Reporter: John Vines
>            Assignee: William Slacum
>            Priority: Minor
>              Labels: mapreduce,
>         Attachments: multi-table-if.patch, new-multitable-if.patch
> Just realized we had no MR input method which supports multiple Tables for an input format.
I would see it making the table the mapper's key and making the Key/Value a tuple, or alternatively
have the Table/Key be the key tuple and stick with Values being the value.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

View raw message