accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Billie Rinaldi (JIRA)" <>
Subject [jira] [Commented] (ACCUMULO-391) Multi-table Accumulo input format
Date Wed, 11 Jul 2012 16:06:35 GMT


Billie Rinaldi commented on ACCUMULO-391:

I think we could probably expand the existing InputFormatBase to cover the multi-table case.
 This would require making columns, ranges, and iterators per-table.  Columns and iterators
are only accessed on a per-table basis, so the table could be encoded in the property key
and the value could be left the same, e.g. conf.set(ITERATORS + "." + Base64.encodeBase64(tableName.getBytes()),
iterators).  (Although I think in the case of iterators we should get rid of the separate
iterators and iterator options properties and just have one combined property.  I'd also like
to see more standardization in the encodings we're using for property values.)  The ranges
are pulled from the configuration all at once, so we should leave them under the RANGES property
key and have either a hierarchical structure in the value, or a flat structure where the table
name is included with each range.  I would suggest new methods to replace the existing ones
of the same names:

void setInputInfo(Configuration conf, String user, byte[] passwd, Authorizations auths)
void setRanges(Configuration conf, Text tableName, Collection<Range> ranges)
void fetchColumns(Configuration conf, Text tableName, Collection<Pair<Text,Text>>
void addIterator(Configuration conf, Text tableName, IteratorSetting cfg)
TabletLocator getTabletLocator(Configuration conf, String tableName)
Map<Text,List<Range>> getRanges(Configuration conf)
Set<Pair<Text,Text>> getFetchedColumns(Configuration conf, String tableName)
List<IteratorSetting> getIterators(Configuration conf, String tableName)

To provide backwards compatibility, we could also keep the old setInputInfo/setRanges/fetchColumns/addIterator
methods and have a concept of a default table specified in setInputInfo that will be the table
used whenever a table isn't specified for setRanges/fetchColumns/addIterator.
> Multi-table Accumulo input format
> ---------------------------------
>                 Key: ACCUMULO-391
>                 URL:
>             Project: Accumulo
>          Issue Type: New Feature
>    Affects Versions: 1.5.0-SNAPSHOT
>            Reporter: John Vines
>            Priority: Minor
>              Labels: mapreduce,
>         Attachments: multi-table-if.patch
> Just realized we had no MR input method which supports multiple Tables for an input format.
I would see it making the table the mapper's key and making the Key/Value a tuple, or alternatively
have the Table/Key be the key tuple and stick with Values being the value.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:!default.jspa
For more information on JIRA, see:


View raw message