systemml-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matthias Boehm" <mbo...@us.ibm.com>
Subject Re: Compatibility with MR1 Cloudera cdh4.2.1
Date Fri, 05 Feb 2016 21:03:25 GMT

ok that is interesting. I think the following is happening: The hadoop
version is >2.0, which makes SystemML switch to the 2.x configuration
properties. However, because MR1 is bundled into this distribution these
configurations do not exist which makes us fail on processing task ids.

Workaround: Change
org.apache.sysml.runtime.matrix.mapred.MRConfigurationNames line 85 to
"boolean hadoopVersion2 = false".

Regards,
Matthias



From:	Ethan Xu <ethan.yifanxu@gmail.com>
To:	dev@systemml.incubator.apache.org
Date:	02/05/2016 12:36 PM
Subject:	Re: Compatibility with MR1 Cloudera cdh4.2.1



Thank you very much. I just pulled the update, rebuilt the project and
reran the code.

The method-not-found error was gone, and the MapReduce job was kicked off.
The 'Assign-RowID-MR' job finished successfully.
The map phase of 'CSV-Reblock-MR' job finished, but reducers threw
NullPointerExceptions at

java.lang.NullPointerException
		 at org.apache.sysml.runtime.matrix.mapred.ReduceBase.close
(ReduceBase.java:205)
		 at org.apache.hadoop.mapred.ReduceTask.runOldReducer
(ReduceTask.java:516)
		 at org.apache.hadoop.mapred.ReduceTask.run
(ReduceTask.java:447)
		 at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
		 at java.security.AccessController.doPrivileged(Native Method)
		 at javax.security.auth.Subject.doAs(Subject.java:415)
		 at org.apache.hadoop.security.UserGroupInformation.doAs
(UserGroupInformation.java:1408)
		 at org.apache.hadoop.mapred.Child.main(Child.java:262)

The job I ran was the same as before on the same data:
hadoop jar <SystemML dir>/target/SystemML.jar -libjars <local
dir>/hadoop-lzo-0.4.15.jar -f <SystemML
dir>/scripts/algorithms/Univar-Stats.dml -nvargs X=<HDFS
dir>/original-coded.csv TYPES=<HDFS dir>/original-coded-type.csv
STATS=<HDFS dir>/univariate-summary.csv

The hadoop cluster was also the same one: CDH4.2.1.

Sorry for keep coming back with problems on a really old hadoop system.
Please let me know what other information is needed to diagnose the issue.

Ethan


On Fri, Feb 5, 2016 at 1:26 PM, Deron Eriksson <deroneriksson@gmail.com>
wrote:

> Hi Ethan,
>
> I believe your safest, cleanest bet is to wait for the fix from Matthias.
> When he pushes the fix, you will see it at
> https://github.com/apache/incubator-systemml/commits/master. At that
> point,
> you can pull (git pull) the changes from GitHub to your machine and then
> build with Maven utilizing the new changes.
>
> Alternatively, it's not really recommended, but you might be able to use
> -libjars to reference the hadoop-commons jar, which should be in your
local
> maven repository
>
>
(.m2/repository/org/apache/hadoop/hadoop-common/2.4.1/hadoop-common-2.4.1.jar).

> However, mixing jar versions usually doesn't work very well (it can lead
to
> other problems), so waiting for the fix is best.
>
> Deron
>
>
> On Fri, Feb 5, 2016 at 6:47 AM, Ethan Xu <ethan.yifanxu@gmail.com> wrote:
>
> > Thank you Shirish and Deron for the suggestions. Looking forward to the
> fix
> > from Matthias!
> >
> > We are using the hadoop-common shipped with CDH4.2.1, and it's in
> > classpath. I'm a bit hesitate to alter our hadoop configuration to
> include
> > other versions since other people are using it too.
> >
> > Not sure if/how the following naive approach affects the program
> behavior,
> > but I did try changing the scope of
> >
> > <groupId>org.apache.hadoop</groupId>
> > <artifactId>hadoop-common</artifactId>
> > <version>${hadoop.version}</version>
> >
> > in SystemML's pom.xml from 'provided' to 'compile' and rebuilt the jar
> > (21MB), and it threw the same error.
> >
> > By the way this is in pom.xml line 65 - 72:
> > <properties>
> >           <hadoop.version>2.4.1</hadoop.version>
> >           <antlr.version>4.3</antlr.version>
> >           <spark.version>1.4.1</spark.version>
> >
> >                 <!-- OS-specific JVM arguments for running integration
>
> tests -->
> >                 <integrationTestExtraJVMArgs />
> > </properties>
> >
> > Am I supposed to modify the hadoop.version before build?
> >
> > Thanks again,
> >
> > Ethan
> >
> >
> >
> > On Fri, Feb 5, 2016 at 2:29 AM, Deron Eriksson
<deroneriksson@gmail.com>
> > wrote:
> >
> > > Hi Matthias,
> > >
> > > Glad to hear the fix is simple. Mixing jar versions sometimes is not
> very
> > > fun.
> > >
> > > Deron
> > >
> > >
> > > On Thu, Feb 4, 2016 at 11:10 PM, Matthias Boehm <mboehm@us.ibm.com>
> > wrote:
> > >
> > > > well, let's not mix different hadoop versions in the class path or
> > > > client/server. If I'm not mistaken, cdh 4.x always shipped with MR
> v1.
> > > It's
> > > > a trivial fix for us and will be in the repo tomorrow morning
anyway.
> > > > Thanks for catching this issue Ethan.
> > > >
> > > > Regards,
> > > > Matthias
> > > >
> > > > [image: Inactive hide details for Deron Eriksson ---02/04/2016
> 11:04:38
> > > > PM---Hi Ethan, Just FYI, I looked at
hadoop-common-2.0.0-cdh4.2]Deron
> > > > Eriksson ---02/04/2016 11:04:38 PM---Hi Ethan, Just FYI, I looked
at
> > > > hadoop-common-2.0.0-cdh4.2.1.jar (
> > > >
> > > > From: Deron Eriksson <deroneriksson@gmail.com>
> > > > To: dev@systemml.incubator.apache.org
> > > > Date: 02/04/2016 11:04 PM
> > > > Subject: Re: Compatibility with MR1 Cloudera cdh4.2.1
> > > > ------------------------------
> > > >
> > > >
> > > >
> > > > Hi Ethan,
> > > >
> > > > Just FYI, I looked at hadoop-common-2.0.0-cdh4.2.1.jar (
> > > >
> > > >
> > >
> >
>
https://repository.cloudera.com/artifactory/repo/org/apache/hadoop/hadoop-common/2.0.0-cdh4.2.1/

> > > > ),
> > > > since I don't see a 2.0.0-mr1-cdh4.2.1 version, and the
> > > > org.apache.hadoop.conf.Configuration class in that jar doesn't
appear
> > to
> > > > have a getDouble method, so using that version of hadoop-common
won't
> > > work.
> > > >
> > > > However, the hadoop-common-2.4.1.jar (
> > > >
> > > >
> > >
> >
>
https://repository.cloudera.com/artifactory/repo/org/apache/hadoop/hadoop-common/2.4.1/

> > > > )
> > > >
> > > > does appear to have the getDouble method. It's possible that adding
> > that
> > > > jar to your classpath may fix your problem, as Shirish pointed out.
> > > >
> > > > It sounds like Matthias may have another fix.
> > > >
> > > > Deron
> > > >
> > > >
> > > >
> > > > On Thu, Feb 4, 2016 at 6:40 PM, Matthias Boehm <mboehm@us.ibm.com>
> > > wrote:
> > > >
> > > > > well, we did indeed not run on MR v1 for a while now. However, I
> > don't
> > > > > want to get that far and say we don't support it anymore. I'll
fix
> > this
> > > > > particular issue by tomorrow.
> > > > >
> > > > > In the next couple of weeks we should run our full performance
> > > testsuite
> > > > > (for broad coverage) over an MR v1 cluster and systematically
> remove
> > > > > unnecessary incompatibility like this instance. Any volunteers?
> > > > >
> > > > > Regards,
> > > > > Matthias
> > > > >
> > > > > [image: Inactive hide details for Ethan Xu ---02/04/2016 05:51:28
> > > > > PM---Hello, I got an error when running the
> > > > systemML/scripts/Univar-S]Ethan
> > > > > Xu ---02/04/2016 05:51:28 PM---Hello, I got an error when running
> the
> > > > > systemML/scripts/Univar-Stats.dml script on
> > > > >
> > > > > From: Ethan Xu <ethan.yifanxu@gmail.com>
> > > > > To: dev@systemml.incubator.apache.org
> > > > > Date: 02/04/2016 05:51 PM
> > > > > Subject: Compatibility with MR1 Cloudera cdh4.2.1
> > > > > ------------------------------
> > > >
> > > > >
> > > > >
> > > > >
> > > > > Hello,
> > > > >
> > > > > I got an error when running the systemML/scripts/Univar-Stats.dml
> > > script
> > > > on
> > > > > a hadoop cluster (Cloudera CDH4.2.1) on a 6GB data set. Error
> message
> > > is
> > > > at
> > > > > the bottom of the email. The same script ran fine on a smaller
> sample
> > > > > (several MB) of the same data set, when MR was not invoked.
> > > > >
> > > > > The main error was java.lang.NoSuchMethodError:
> > > > > org.apache.hadoop.mapred.JobConf.getDouble()
> > > > > Digging deeper, it looks like the CDH4.2.1 version of MR indeed
> > didn't
> > > > have
> > > > > the JobConf.getDouble() method.
> > > > >
> > > > > The hadoop-core jar of CDH4.2.1 can be found here:
> > > > >
> > > > >
> > > >
> > > >
> > >
> >
>
https://repository.cloudera.com/artifactory/repo/org/apache/hadoop/hadoop-core/2.0.0-mr1-cdh4.2.1/

> > > >
> > > > >
> > > > > The calling line of SystemML is line 1194 of
> > > > >
> > > > >
> > > >
> > >
> >
>
https://github.com/apache/incubator-systemml/blob/master/src/main/java/org/apache/sysml/runtime/matrix/mapred/MRJobConfiguration.java

> > > > >
> > > > > I was wondering, if the finding is accurate, is there a potential
> > fix,
> > > or
> > > > > does this mean the current version of SystemML is not compatible
> with
> > > > > CDH4.2.1?
> > > > >
> > > > > Thank you,
> > > > >
> > > > > Ethan
> > > > >
> > > > >
> > > > > hadoop jar $sysDir/target/SystemML.jar -f
> > > > > $sysDir/scripts/algorithms/Univar-Stats.dml -nvargs
> > > > > X=$baseDirHDFS/original-coded.csv
> > > > > TYPES=$baseDirHDFS/original-coded-type.csv
> > > > > STATS=$baseDirHDFS/univariate-summary.csv
> > > > >
> > > > > 16/02/04 20:35:03 INFO api.DMLScript: BEGIN DML run 02/04/2016
> > 20:35:03
> > > > > 16/02/04 20:35:03 INFO api.DMLScript: HADOOP_HOME: null
> > > > > 16/02/04 20:35:03 WARN conf.DMLConfig: No default SystemML config
> > file
> > > > > (./SystemML-config.xml) found
> > > > > 16/02/04 20:35:03 WARN conf.DMLConfig: Using default settings in
> > > > DMLConfig
> > > > > 16/02/04 20:35:04 WARN hops.OptimizerUtils: Auto-disable
> > multi-threaded
> > > > > text read for 'text' and 'csv' due to thread contention on JRE <
> 1.8
> > > > > (java.version=1.7.0_71).
> > > > > SLF4J: Class path contains multiple SLF4J bindings.
> > > > > SLF4J: Found binding in
> > > > >
> > > > >
> > > >
> > >
> >
>
[jar:file:/usr/lib/zookeeper/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]

> > > > > SLF4J: Found binding in
> > > > >
> > > > >
> > > >
> > >
> >
>
[jar:file:/usr/local/explorys/datagrid/lib/slf4j-jdk14-1.6.6.jar!/org/slf4j/impl/StaticLoggerBinder.class]

> > > > > SLF4J: Found binding in
> > > > >
> > > > >
> > > >
> > >
> >
>
[jar:file:/usr/local/explorys/datagrid/lib/logback-classic-1.0.6.jar!/org/slf4j/impl/StaticLoggerBinder.class]

> > > > > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for
> an
> > > > > explanation.
> > > > > 16/02/04 20:35:07 INFO api.DMLScript: SystemML Statistics:
> > > > > Total execution time:        0.880 sec.
> > > > > Number of executed MR Jobs:    0.
> > > > >
> > > > > 16/02/04 20:35:07 INFO api.DMLScript: END DML run 02/04/2016
> 20:35:07
> > > > > Exception in thread "main" java.lang.NoSuchMethodError:
> > > > > org.apache.hadoop.mapred.JobConf.getDouble(Ljava/lang/String;D)D
> > > > >    at
> > > > >
> > > > >
> > > >
> > >
> >
>
org.apache.sysml.runtime.matrix.mapred.MRJobConfiguration.setUpMultipleInputs
(MRJobConfiguration.java:1195)
> > > > >    at
> > > > >
> > > > >
> > > >
> > >
> >
>
org.apache.sysml.runtime.matrix.mapred.MRJobConfiguration.setUpMultipleInputs
(MRJobConfiguration.java:1129)
> > > > >    at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.sysml.runtime.matrix.CSVReblockMR.runAssignRowIDMRJob
(CSVReblockMR.java:307)
> > > > >    at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.sysml.runtime.matrix.CSVReblockMR.runAssignRowIDMRJob
(CSVReblockMR.java:289)
> > > > >    at
> > > > >
> > > >
> > >
> >
> org.apache.sysml.runtime.matrix.CSVReblockMR.runJob
(CSVReblockMR.java:275)
> > > > >    at
> > > > org.apache.sysml.lops.runtime.RunMRJobs.submitJob
(RunMRJobs.java:257)
> > > > >    at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.sysml.lops.runtime.RunMRJobs.prepareAndSubmitJob
(RunMRJobs.java:143)
> > > > >    at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.sysml.runtime.instructions.MRJobInstruction.processInstruction
(MRJobInstruction.java:1500)
> > > > >    at
> > > > >
> > > > >
> > > >
> > >
> >
>
org.apache.sysml.runtime.controlprogram.ProgramBlock.executeSingleInstruction
(ProgramBlock.java:309)
> > > > >    at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.sysml.runtime.controlprogram.ProgramBlock.executeInstructions
(ProgramBlock.java:227)
> > > > >    at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.sysml.runtime.controlprogram.ProgramBlock.execute
(ProgramBlock.java:169)
> > > > >    at
> > > > >
> > >
> org.apache.sysml.runtime.controlprogram.Program.execute(Program.java:146)
> > > > >    at org.apache.sysml.api.DMLScript.execute(DMLScript.java:676)
> > > > >    at
> > org.apache.sysml.api.DMLScript.executeScript(DMLScript.java:338)
> > > > >    at org.apache.sysml.api.DMLScript.main(DMLScript.java:197)
> > > > >    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> > > > >    at
> > > > >
> > > > >
> > > >
> > >
> >
> sun.reflect.NativeMethodAccessorImpl.invoke
(NativeMethodAccessorImpl.java:57)
> > > > >    at
> > > > >
> > > > >
> > > >
> > >
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke
(DelegatingMethodAccessorImpl.java:43)
> > > > >    at java.lang.reflect.Method.invoke(Method.java:606)
> > > > >    at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
> > > > >
> > > > >
> > > > >
> > > >
> > > >
> > > >
> > >
> >
> >
> >
>
>


Mime
  • Unnamed multipart/related (inline, None, 0 bytes)
View raw message