accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Josh Elser (JIRA)" <>
Subject [jira] [Commented] (ACCUMULO-1854) Accumulo{Input,Output}Format can't handle multiple configurations
Date Thu, 07 Nov 2013 05:30:18 GMT


Josh Elser commented on ACCUMULO-1854:

I was talking to Christopher tonight about this. He did bring up the good point about why
not to use the AccumuloMultiTableInputFormat. One point we came to was that making these changes
would allow single M/R jobs to talk to separate Accumulo clusters instead of a single cluster.

I did settle on a change that I'm not completely happy about that is reliant on the fact that
splits are generated by one host in serial. If they were generated in parallel, my approach
would break. However, given that the InputFormata can't rely on getting the same Configuration
object in each invocation of getSplits, the only other reliable approach I could come up with
was to use something like HDFS which has its own sort of concurrency issues. Since it's not
an issue now, I've punted on worrying about it.

> Accumulo{Input,Output}Format can't handle multiple configurations
> -----------------------------------------------------------------
>                 Key: ACCUMULO-1854
>                 URL:
>             Project: Accumulo
>          Issue Type: Bug
>    Affects Versions: 1.4.4, 1.5.0
>            Reporter: Josh Elser
>            Assignee: Josh Elser
>             Fix For: 1.4.5, 1.5.1, 1.6.1
> I noticed that I was unable to properly use MultipleInputs (or any code which uses a
similar approach) with the AccumuloInputFormat class because of the way it builds up information
in the Configuration object.
> It would be useful to be able to have multiple instances of AIF (and AOF) configured
within one Job (Configuration).

This message was sent by Atlassian JIRA

View raw message