hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve S (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-4688) setJarByClass does not work under JBoss AS 7
Date Mon, 14 Apr 2014 21:02:15 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-4688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13968833#comment-13968833
] 

Steve S commented on MAPREDUCE-4688:
------------------------------------

Thanks for this workaround.  It really helped me out!

One suggestion I might make is to modify the code as follows.  It allows the job wrapper to
work in either JBoss or standalone(without any sort of check or custom handling for both situations)
and doesn't introduce the dependency on a JBoss binary at compile or runtime(unless it's triggered
at runtime by running in JBoss and having a vfs resource).

                if (!"vfs".equals(classUrl.getProtocol())) {
                    super.setJarByClass(cls);
                    continue;
                }

                final String jarFile = classUrl.getFile().substring(0, classUrl.getFile().length()
- (classFile.length() + 1) ); //+1 because of '/'
                final URL jarUrl = new URL(classUrl.getProtocol(), classUrl.getHost(), classUrl.getPort(),
jarFile,
                    (URLStreamHandler)Class.forName("org.jboss.vfs.protocol.VirtualFileURLStreamHandler").newInstance());

> setJarByClass does not work under JBoss AS 7
> --------------------------------------------
>
>                 Key: MAPREDUCE-4688
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4688
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 0.20.2, 1.0.3
>         Environment: Hadoop Cloudera CDH3 Cluster w/Hadoop 0.20.2 on CentOS 5/6
> Client on JBoss AS 7.1.1.Final CentOS/Windows
>            Reporter: Philippe
>            Priority: Minor
>              Labels: patch
>
> Hello,
> I’m using Hadoop as a client from a J2EE web application. One of the lib within my
EAR is a jar containing several Map/Reduce jobs. Using JBoss AS 4 in the past, I had no problem
running the jobs with the following code:
> {code}
> try {
>     final Configuration conf = HBaseConfiguration.create();
>     // Load all Hadoop configuration
>     conf.addResource("core-site.xml");
>     conf.addResource("hdfs-site.xml");
>     conf.addResource("mapred-site.xml");
>     conf.addResource("hbase-site.xml");
>     final Job job = new Job(conf, "My Job");
>     job.setJarByClass(MyJobClass.getClass());
> 			
>     			
>     TableMapReduceUtil.initTableMapperJob(...);
> 			
>     TableMapReduceUtil.initTableReducerJob(...);
> 			
>     final boolean status = job.waitForCompletion(true);
> } ...
> {code}
> Since, we have moved to JBoss AS 7 and the method setJarByClass is no longer working.
Indeed, in *org.apache.hadoop.mapred.JobConf.findContainingJar(Class)* the retrieved URL does
not have a *jar* protocol, but a JBoss *vfs* protocol. So, it always return null and the jar
is not sent to the Map/Reduce cluster.
> With the VFS protocol the resource name may be or may not be the actual system file name
of the resource. I mean, the class file is within the jar file which may be within an ear
file in case of a non-exploded deployment, so there are not system File corresponding to the
resource. Though, I guess similar issues may happen with jar: protocol.
> In order to make the job working with JBoss AS-7, I did the following implementation
of the Job class. This override the setJarByClass mechanism, by creating a temporary jar file
from the actual jar file read from vfs.
> {code}
> /**
>  * Patch of Map/Red Job to handle VFS jar file
>  */
> public class VFSJob extends Job
> {
> 	
>     /** Logger */
>     private static final transient Logger logger = LoggerFactory.getLogger(VFSJob.class);
>     public VFSJob() throws IOException {
>         super();
>     }
>     public VFSJob(Configuration conf) throws IOException {
>         super(conf);
>     }
>     public VFSJob(Configuration conf, String jobName) throws IOException {
>         super(conf, jobName);
>     }
> 	
>     private File temporaryJarFile;
>     /**
>      * Patch of setJarByClass to handle VFS
>      */
>     @Override
>     public void setJarByClass(Class<?> cls) {
>        final ClassLoader loader = cls.getClassLoader();
>         final String classFile = cls.getName().replaceAll("\\.", "/") + ".class";
>         JarInputStream is = null;
>         JarOutputStream os = null;
>         try {
>             final Enumeration<URL> itr = loader.getResources(classFile);
>             while (itr.hasMoreElements()) {
>                 final URL classUrl = itr.nextElement();
>                 // This is the trick
>                 if (!"vfs".equals(classUrl.getProtocol())) {
>                     continue;
>                 }
> 			   
>                 final String jarFile = classUrl.getFile().substring(0, classUrl.getFile().length()
- (classFile.length() + 1) ); //+1 because of '/'
>                 final URL jarUrl = new URL(classUrl.getProtocol(), classUrl.getHost(),
classUrl.getPort(), jarFile, new org.jboss.vfs.protocol.VirtualFileURLStreamHandler());
> 			  
>                 temporaryJarFile = File.createTempFile("mapred", ".jar");
>                 is = (JarInputStream) jarUrl.openStream();
>                 os = new JarOutputStream(new FileOutputStream(temporaryJarFile));
>                 final byte[] buffer = new byte[2048];
>                 for (JarEntry entry = is.getNextJarEntry(); entry !=null; entry = is.getNextJarEntry())
{
>                     os.putNextEntry(entry);
>                     int bytesRead;
>                     while ((bytesRead = is.read(buffer)) != -1) {
>                         os.write(buffer, 0, bytesRead);
>                     }
>                 }
>                 this.conf.setJar(temporaryJarFile.getPath());
>                 return;
>             }
>         } catch (IOException e) {
>             throw new RuntimeException(e);
>         } finally {
>             if (is != null) {
>                 try {
>                     is.close();
>                 } catch (IOException e) {
>                     logger.error("Error closing input stream", e);
>                 }
>             }
>             if (os != null) {
>                 try {
>                     os.close();
>                 } catch (IOException e) {
>                     logger.error("Error closing output stream", e);
>                 }
>             }
>         }
> 		
>         return;
>     }
> 	
>     /**
>      * Clean the temporary jar file created
>      */
>     public void clean() {
>         if (temporaryJarFile != null && temporaryJarFile.exists()) {
>             final boolean isDeleted = temporaryJarFile.delete();
>             if (!isDeleted) {
>                 logger.error("Error while deleting temporary jar " + temporaryJarFile);
>             }
>         }
>     }
> }
> {code}
> So, after the call, I must not forget deleting the temporary jar file.
> {code}
> VFSJob job = null;
> try {
>     final Configuration conf = HBaseConfiguration.create();
>     // Load all Hadoop configuration
>     conf.addResource("core-site.xml");
>     conf.addResource("hdfs-site.xml");
>     conf.addResource("mapred-site.xml");
>     conf.addResource("hbase-site.xml");
>     job = new VFSJob(conf, "My Job");
>     job.setJarByClass(MyJobClass.getClass());
> 			
> 			
>     TableMapReduceUtil.initTableMapperJob(...);
> 			
>     TableMapReduceUtil.initTableReducerJob(...);
> 			
>     final boolean status = job.waitForCompletion(true);
> } ...
> } finally {
>     job.clean();
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message