giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Martin Junghanns <martin.jungha...@gmx.net>
Subject Problem Giraph + Hadoop + HBase
Date Sat, 21 Feb 2015 12:20:23 GMT
Hi all,

this might be a bit specific question and I don't know if the problem is 
Giraph, Hadoop or HBase related
but maybe someone has an idea.

I am running an application on a cluster using:

Hadoop 2.5.1
Giraph 1.1.0-hadoop2
HBase 0.98.10.1-hadoop2

Giraph jobs run fine when I start them via the GiraphRunner using text 
base input formats. My application is a
fat-jar containing Giraph libs, but not HBase libs (provided). HBase 
libs are in the HADOOP_CLASSPATH and
MapReduce jobs using HBase as data source / sink run fine.

The problem occurs when I start a GiraphJob from my Driver program. The 
driver does the following:
1) Bulk Load text data into HBase via MapReduce
2) Run a Giraph algorithm using HBase as data source (using 
TableInputFormat)

The *driver runs fine in a unit test* using the MiniCluster.

When I start the driver on a cluster,  1) runs successful but after the 
GiraphJob is submitted, I get  a:

2015-02-21 12:50:38,954 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: OutputCommitter
set in config null
2015-02-21 12:50:39,018 FATAL [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Error
starting MRAppMaster
java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/mapreduce/TableInputFormat
	at org.myapp.io.HBaseVertexInputFormat.<clinit>(HBaseVertexInputFormat.java:48)
	at java.lang.Class.forName0(Native Method)
	at java.lang.Class.forName(Class.java:274)
	at org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:1844)
	at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1809)
	at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1903)
	at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1929)
	at org.apache.giraph.conf.ClassConfOption.get(ClassConfOption.java:128)
	at org.apache.giraph.conf.GiraphClasses.<init>(GiraphClasses.java:180)
	at org.apache.giraph.conf.ImmutableClassesGiraphConfiguration.<init>(ImmutableClassesGiraphConfiguration.java:138)
	at org.apache.giraph.bsp.BspOutputFormat.getOutputCommitter(BspOutputFormat.java:62)
	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.createOutputCommitter(MRAppMaster.java:473)
	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceInit(MRAppMaster.java:376)
	at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$1.run(MRAppMaster.java:1485)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1482)
	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1415)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hbase.mapreduce.TableInputFormat
	at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
	at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
	at java.security.AccessController.doPrivileged(Native Method)
	at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
	... 20 more
2015-02-21 12:50:39,021 INFO [main] org.apache.hadoop.util.ExitUtil: Exiting with status 1

HBaseVertexInputFormat.java:48: protected static final TableInputFormat BASE_FORMAT = new
TableInputFormat();

The class*org/apache/hadoop/hbase/mapreduce/TableInputFormat*  is contained in*hbase-server-0.98.10.1-hadoop2.jar*
 which
is in the HADOOP_CLASSPATH and - according the the nodemanager logs - gets downloaded from
staging when the application runs.

The GiraphJob is initialized in the driver the following way:

//...
conf.set(TableInputFormat.INPUT_TABLE, MY_TABLE);
conf.set(TableOutputFormat.OUTPUT_TABLE, MY_TABLE);

GiraphJob job = new GiraphJob(conf, JOB_NAME);
GiraphConfiguration giraphConf = job.getConfiguration();
giraphConf.setComputationClass(MyComputation.class);
giraphConf.setVertexInputFormatClass(MyHBaseVertexInputFormat.class);
giraphConf.setVertexOutputFormatClass(MyHBaseVertexOutputFormat.class);
giraphConf.setWorkerConfiguration(workerCount, workerCount, 100f);

job.run(verbose);
//...

Fyi, the*driver ran fine on a Hadoop 1.2.1 cluster with hbase and giraph libs (hadoop1) packaged
in my jar*.
But since this is not really necessary (at least for HBase), there seems to be a problem loading
the jars in the GiraphJob.

Hope you guys have any ideas.

Thanks in advance.

Cheers,
Martin








Mime
View raw message