My apologies if this is covered somewhere, I've done a lot of searching and come up dry.
I am migrating a set of applications from Hadoop 1.0.3/Accumulo 1.4.1 to Hadoop 2.6.0/Accumulo 1.6.1. The applications are launched by my custom java apps, using the Hadoop Tool/Configured interface setup, not a big deal.
To run MR jobs with AccumuloInputFormat/OutputFormat, in 1.0 I could use tool.sh to launch the programs, which worked great for local on-cluster launching. I however needed to launch from remote hosts (maybe even Windows ones), and I would bundle a large lib dir with everything I needed on the client-side, and fill out HADOOP_CLASSPATH in hadoop-env.sh with everything I needed (basically copied the output of accumulo classpath). This would work for remote submissions, or even local ones, but specifically using my java mains to launch them without any accumulo or hadoop wrapper scripts.
In YARN MR 2.6 this doesn't seem to work. No matter what I do, I can't seem to get a normal java app to have the 2.x MR Application Master pick up the accumulo items in the classpath, and my jobs fail with ClassNotFound exceptions. tool.sh works just fine, but again, I need to be able to submit without that environment.
I have tried (on the cluster):
HADOOP_CLASSPATH in hadoop-env.sh
HADOOP_CLASSPATH from .bashrc
yarn.application.classpath in yarn-site.xml
I don't mind using tool.sh locally, it's quite nice, but I need a strategy to have the cluster "setup" so I can just launch java, set my appropriate hadoop configs for remote fs and yarn hosts, get my accumulo connections and in/out setup for mapreduce and launch jobs which have accumulo awareness.