You should look at the job conf file.
You will see that indeed the class for the mapper and reducer are explicitly written.
So if you generate the class only on the client, the other machines won't be able to load it indeed.
You should also look at Cascading which does a bit of what you are trying to do.
The trick they use is that the mapper and reducer are only deserializer wrapper classes.
They will read the serialized logic (which could be any graph of serialized objects) from the job conf file.
when submiting a job,the ToolRunnuer or JobClient just distribute your jars to hdfs,so that tasktrackers can launch/"re-run" it.In your case,you should have your dynamic class re-generate in mapper/reducer`s setup method,or the runtime classloader will miss them all.On Tue, Nov 13, 2012 at 7:58 AM, Jay Vyas <email@example.com> wrote:
Hi guys:Im trying to dynamically create a java class at runtime and submit it as a hadoop job.How does the Mapper (or for that matter, Reducer) use the data in the Job object? That is, how does it load a class? Is the job object serialized, along with all the info necessary to load a class?The reason im wondering is that, in all reality, the class im creating will not be on the classpath of JVM's in a distributed environment. But indeed, it will exist when the Job is created . So Im wondering wether simply "creating" a dynamic class in side of the job executioner will be serialized and sent over the wire in such a way that it can be instantiated in a different JVM or not.