accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <>
Subject Re: MultipleInputs with AccumuloInputFormat
Date Tue, 05 Nov 2013 17:15:54 GMT
Heh, ok.

I'm currently working through a bit of a prototype to see how it works.

I'm not a mapred/mapreduce expert, but I *think* I have an approach that 
will work. Keep an eye out for a Jira -- would love feedback.

On 11/5/13, 12:13 PM, Kevin Faro wrote:
> I recently looked into that and came to the same realization.
> I ended up writing a new input format that did the cartesian product of two
> tables.  But to do that I had to store values for the left configuration
> and right configuration and then copy over whichever config settings I
> wanted to use for the AIF depending on which split i needed in the
> RecordReader.
> It would have been awesome if I could have just used the MultipleInputs ...
> --Kevin
> On Tue, Nov 5, 2013 at 10:24 AM, Josh Elser <> wrote:
>> In executing some MapReduce over Accumulo with the AccumuloInputFormat, I
>> came to the realization that AIF fundamentally doesn't work with concepts
>> like MultipleInputs in Hadoop (
>> docs/current/api/org/apache/hadoop/mapreduce/lib/input/MultipleInputs.html).
>> Given that you can only write one set of configuration for AIF into a
>> Configuration object, there's not a mechanism to support multiple. This
>> appears to be the case across all versions.
>> Is this correct? Have I overlooked something?

View raw message