systemml-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matthias Boehm" <mbo...@us.ibm.com>
Subject Re: Compatibility with MR1 Cloudera cdh4.2.1
Date Sat, 06 Feb 2016 04:17:58 GMT

Thanks for letting us know Ethan. The reason was likely that
JobContext.TASK_ISMAP was not in the configuration.

We'll setup an MR v1 cluster next week and systematically resolve all these
incompatibility issues in order to avoid additional round trips.

Regards,
Matthias



From:	Ethan Xu <ethan.yifanxu@gmail.com>
To:	dev@systemml.incubator.apache.org
Date:	02/05/2016 08:04 PM
Subject:	Re: Compatibility with MR1 Cloudera cdh4.2.1



Seems it's a problem of

String taskType = (conf.getBoolean(JobContext.TASK_ISMAP, true)) ? "m" :
"r"
;

on line 137 of

org.apache.sysml.runtime.matrix.data.MultipleOutputCommitter

conf.getBoolean(JobContext.TASK_ISMAP, true) returns 'true' (perhaps using
provided default value) even in reducers
causing (line 139) 'charIx' to get value -1 that leads to the error

Confirmed on my test case. When error occurs in reducer, 'name' has value
'0-r-00000', but 'taskType' has value 'm'.

The following (ugly) hack fixed the problem. DML ran successfully
afterwards.

Change line 137 - 139 of
    org.apache.sysml.runtime.matrix.data.MultipleOutputCommitter
to:

String name =  file.getName();
Pattern p = Pattern.compile("-[rm]-");
Matcher m = p.matcher(name);
int charIx = 0;
if (m.find()) {
    charIx = m.start();
}else{
    throw new RuntimeException("file name :" + name + "doesn't contain 'r'
or 'm'");
}

Ethan


On Fri, Feb 5, 2016 at 4:37 PM, Ethan Xu <ethan.yifanxu@gmail.com> wrote:

> Thanks tried that and moved a bit further. Now a new exception (still in
> reduce phase of 'CSV-Reblock-MR'):
>
> WARN org.apache.hadoop.mapred.Child: Error running child
> java.lang.StringIndexOutOfBoundsException: String index out of range: -1
> 		 at java.lang.String.substring(String.java:1911)
> 		 at
org.apache.sysml.runtime.matrix.data.MultipleOutputCommitter.moveFileToDestination
(MultipleOutputCommitter.java:140)
> 		 at
org.apache.sysml.runtime.matrix.data.MultipleOutputCommitter.moveFinalTaskOutputs
(MultipleOutputCommitter.java:119)
> 		 at
org.apache.sysml.runtime.matrix.data.MultipleOutputCommitter.commitTask
(MultipleOutputCommitter.java:94)
> 		 at org.apache.hadoop.mapred.OutputCommitter.commitTask
(OutputCommitter.java:221)
> 		 at org.apache.hadoop.mapred.Task.commit(Task.java:1005)
> 		 at org.apache.hadoop.mapred.Task.done(Task.java:875)
> 		 at org.apache.hadoop.mapred.ReduceTask.run
(ReduceTask.java:453)
> 		 at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
> 		 at java.security.AccessController.doPrivileged(Native Method)
> 		 at javax.security.auth.Subject.doAs(Subject.java:415)
> 		 at org.apache.hadoop.security.UserGroupInformation.doAs
(UserGroupInformation.java:1408)
> 		 at org.apache.hadoop.mapred.Child.main(Child.java:262)
>
>
>
> On Fri, Feb 5, 2016 at 4:03 PM, Matthias Boehm <mboehm@us.ibm.com> wrote:
>
>> ok that is interesting. I think the following is happening: The hadoop
>> version is >2.0, which makes SystemML switch to the 2.x configuration
>> properties. However, because MR1 is bundled into this distribution these
>> configurations do not exist which makes us fail on processing task ids.
>>
>> Workaround: Change
>> org.apache.sysml.runtime.matrix.mapred.MRConfigurationNames line 85 to
>> *"boolean* hadoopVersion2 = false".
>>
>> Regards,
>> Matthias
>>
>> [image: Inactive hide details for Ethan Xu ---02/05/2016 12:36:27
>> PM---Thank you very much. I just pulled the update, rebuilt the
proje]Ethan
>> Xu ---02/05/2016 12:36:27 PM---Thank you very much. I just pulled the
>> update, rebuilt the project and reran the code.
>>
>> From: Ethan Xu <ethan.yifanxu@gmail.com>
>> To: dev@systemml.incubator.apache.org
>> Date: 02/05/2016 12:36 PM
>> Subject: Re: Compatibility with MR1 Cloudera cdh4.2.1
>> ------------------------------
>>
>>
>>
>> Thank you very much. I just pulled the update, rebuilt the project and
>> reran the code.
>>
>> The method-not-found error was gone, and the MapReduce job was kicked
off.
>> The 'Assign-RowID-MR' job finished successfully.
>> The map phase of 'CSV-Reblock-MR' job finished, but reducers threw
>> NullPointerExceptions at
>>
>> java.lang.NullPointerException
>> at
>> org.apache.sysml.runtime.matrix.mapred.ReduceBase.close
(ReduceBase.java:205)
>> at org.apache.hadoop.mapred.ReduceTask.runOldReducer
(ReduceTask.java:516)
>> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:447)
>> at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
>> at java.security.AccessController.doPrivileged(Native Method)
>> at javax.security.auth.Subject.doAs(Subject.java:415)
>> at
>> org.apache.hadoop.security.UserGroupInformation.doAs
(UserGroupInformation.java:1408)
>> at org.apache.hadoop.mapred.Child.main(Child.java:262)
>>
>> The job I ran was the same as before on the same data:
>> hadoop jar <SystemML dir>/target/SystemML.jar -libjars <local
>> dir>/hadoop-lzo-0.4.15.jar -f <SystemML
>> dir>/scripts/algorithms/Univar-Stats.dml -nvargs X=<HDFS
>> dir>/original-coded.csv TYPES=<HDFS dir>/original-coded-type.csv
>> STATS=<HDFS dir>/univariate-summary.csv
>>
>> The hadoop cluster was also the same one: CDH4.2.1.
>>
>> Sorry for keep coming back with problems on a really old hadoop system.
>> Please let me know what other information is needed to diagnose the
issue.
>>
>> Ethan
>>
>>
>> On Fri, Feb 5, 2016 at 1:26 PM, Deron Eriksson <deroneriksson@gmail.com>
>> wrote:
>>
>> > Hi Ethan,
>> >
>> > I believe your safest, cleanest bet is to wait for the fix from
>> Matthias.
>> > When he pushes the fix, you will see it at
>> > https://github.com/apache/incubator-systemml/commits/master. At that
>> > point,
>> > you can pull (git pull) the changes from GitHub to your machine and
then
>> > build with Maven utilizing the new changes.
>> >
>> > Alternatively, it's not really recommended, but you might be able to
use
>> > -libjars to reference the hadoop-commons jar, which should be in your
>> local
>> > maven repository
>> >
>> >
>>
(.m2/repository/org/apache/hadoop/hadoop-common/2.4.1/hadoop-common-2.4.1.jar).

>> > However, mixing jar versions usually doesn't work very well (it can
>> lead to
>> > other problems), so waiting for the fix is best.
>> >
>> > Deron
>> >
>> >
>> > On Fri, Feb 5, 2016 at 6:47 AM, Ethan Xu <ethan.yifanxu@gmail.com>
>> wrote:
>> >
>> > > Thank you Shirish and Deron for the suggestions. Looking forward to
>> the
>> > fix
>> > > from Matthias!
>> > >
>> > > We are using the hadoop-common shipped with CDH4.2.1, and it's in
>> > > classpath. I'm a bit hesitate to alter our hadoop configuration to
>> > include
>> > > other versions since other people are using it too.
>> > >
>> > > Not sure if/how the following naive approach affects the program
>> > behavior,
>> > > but I did try changing the scope of
>> > >
>> > > <groupId>org.apache.hadoop</groupId>
>> > > <artifactId>hadoop-common</artifactId>
>> > > <version>${hadoop.version}</version>
>> > >
>> > > in SystemML's pom.xml from 'provided' to 'compile' and rebuilt the
jar
>> > > (21MB), and it threw the same error.
>> > >
>> > > By the way this is in pom.xml line 65 - 72:
>> > > <properties>
>> > >           <hadoop.version>2.4.1</hadoop.version>
>> > >           <antlr.version>4.3</antlr.version>
>> > >           <spark.version>1.4.1</spark.version>
>> > >
>> > >                 <!-- OS-specific JVM arguments for running
integration
>> > > tests -->
>> > >                 <integrationTestExtraJVMArgs />
>> > > </properties>
>> > >
>> > > Am I supposed to modify the hadoop.version before build?
>> > >
>> > > Thanks again,
>> > >
>> > > Ethan
>> > >
>> > >
>> > >
>> > > On Fri, Feb 5, 2016 at 2:29 AM, Deron Eriksson <
>> deroneriksson@gmail.com>
>> > > wrote:
>> > >
>> > > > Hi Matthias,
>> > > >
>> > > > Glad to hear the fix is simple. Mixing jar versions sometimes is
not
>> > very
>> > > > fun.
>> > > >
>> > > > Deron
>> > > >
>> > > >
>> > > > On Thu, Feb 4, 2016 at 11:10 PM, Matthias Boehm
<mboehm@us.ibm.com>
>> > > wrote:
>> > > >
>> > > > > well, let's not mix different hadoop versions in the class path
or
>> > > > > client/server. If I'm not mistaken, cdh 4.x always shipped with
MR
>> > v1.
>> > > > It's
>> > > > > a trivial fix for us and will be in the repo tomorrow morning
>> anyway.
>> > > > > Thanks for catching this issue Ethan.
>> > > > >
>> > > > > Regards,
>> > > > > Matthias
>> > > > >
>> > > > > [image: Inactive hide details for Deron Eriksson ---02/04/2016
>> > 11:04:38
>> > > > > PM---Hi Ethan, Just FYI, I looked at
>> hadoop-common-2.0.0-cdh4.2]Deron
>> > > > > Eriksson ---02/04/2016 11:04:38 PM---Hi Ethan, Just FYI, I
looked
>> at
>> > > > > hadoop-common-2.0.0-cdh4.2.1.jar (
>> > > > >
>> > > > > From: Deron Eriksson <deroneriksson@gmail.com>
>> > > > > To: dev@systemml.incubator.apache.org
>> > > > > Date: 02/04/2016 11:04 PM
>> > > > > Subject: Re: Compatibility with MR1 Cloudera cdh4.2.1
>> > > > > ------------------------------
>> > > > >
>> > > > >
>> > > > >
>> > > > > Hi Ethan,
>> > > > >
>> > > > > Just FYI, I looked at hadoop-common-2.0.0-cdh4.2.1.jar (
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
https://repository.cloudera.com/artifactory/repo/org/apache/hadoop/hadoop-common/2.0.0-cdh4.2.1/

>> > > > > ),
>> > > > > since I don't see a 2.0.0-mr1-cdh4.2.1 version, and the
>> > > > > org.apache.hadoop.conf.Configuration class in that jar doesn't
>> appear
>> > > to
>> > > > > have a getDouble method, so using that version of hadoop-common
>> won't
>> > > > work.
>> > > > >
>> > > > > However, the hadoop-common-2.4.1.jar (
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
https://repository.cloudera.com/artifactory/repo/org/apache/hadoop/hadoop-common/2.4.1/

>> > > > > )
>> > > > >
>> > > > > does appear to have the getDouble method. It's possible that
>> adding
>> > > that
>> > > > > jar to your classpath may fix your problem, as Shirish pointed
>> out.
>> > > > >
>> > > > > It sounds like Matthias may have another fix.
>> > > > >
>> > > > > Deron
>> > > > >
>> > > > >
>> > > > >
>> > > > > On Thu, Feb 4, 2016 at 6:40 PM, Matthias Boehm
<mboehm@us.ibm.com
>> >
>> > > > wrote:
>> > > > >
>> > > > > > well, we did indeed not run on MR v1 for a while now. However,
I
>> > > don't
>> > > > > > want to get that far and say we don't support it anymore.
I'll
>> fix
>> > > this
>> > > > > > particular issue by tomorrow.
>> > > > > >
>> > > > > > In the next couple of weeks we should run our full performance
>> > > > testsuite
>> > > > > > (for broad coverage) over an MR v1 cluster and systematically
>> > remove
>> > > > > > unnecessary incompatibility like this instance. Any
volunteers?
>> > > > > >
>> > > > > > Regards,
>> > > > > > Matthias
>> > > > > >
>> > > > > > [image: Inactive hide details for Ethan Xu ---02/04/2016
>> 05:51:28
>> > > > > > PM---Hello, I got an error when running the
>> > > > > systemML/scripts/Univar-S]Ethan
>> > > > > > Xu ---02/04/2016 05:51:28 PM---Hello, I got an error when
>> running
>> > the
>> > > > > > systemML/scripts/Univar-Stats.dml script on
>> > > > > >
>> > > > > > From: Ethan Xu <ethan.yifanxu@gmail.com>
>> > > > > > To: dev@systemml.incubator.apache.org
>> > > > > > Date: 02/04/2016 05:51 PM
>> > > > > > Subject: Compatibility with MR1 Cloudera cdh4.2.1
>> > > > > > ------------------------------
>> > > > >
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > > Hello,
>> > > > > >
>> > > > > > I got an error when running the
>> systemML/scripts/Univar-Stats.dml
>> > > > script
>> > > > > on
>> > > > > > a hadoop cluster (Cloudera CDH4.2.1) on a 6GB data set.
Error
>> > message
>> > > > is
>> > > > > at
>> > > > > > the bottom of the email. The same script ran fine on a smaller
>> > sample
>> > > > > > (several MB) of the same data set, when MR was not invoked.
>> > > > > >
>> > > > > > The main error was java.lang.NoSuchMethodError:
>> > > > > > org.apache.hadoop.mapred.JobConf.getDouble()
>> > > > > > Digging deeper, it looks like the CDH4.2.1 version of MR
indeed
>> > > didn't
>> > > > > have
>> > > > > > the JobConf.getDouble() method.
>> > > > > >
>> > > > > > The hadoop-core jar of CDH4.2.1 can be found here:
>> > > > > >
>> > > > > >
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
https://repository.cloudera.com/artifactory/repo/org/apache/hadoop/hadoop-core/2.0.0-mr1-cdh4.2.1/

>> > > > >
>> > > > > >
>> > > > > > The calling line of SystemML is line 1194 of
>> > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
https://github.com/apache/incubator-systemml/blob/master/src/main/java/org/apache/sysml/runtime/matrix/mapred/MRJobConfiguration.java

>> > > > > >
>> > > > > > I was wondering, if the finding is accurate, is there a
>> potential
>> > > fix,
>> > > > or
>> > > > > > does this mean the current version of SystemML is not
compatible
>> > with
>> > > > > > CDH4.2.1?
>> > > > > >
>> > > > > > Thank you,
>> > > > > >
>> > > > > > Ethan
>> > > > > >
>> > > > > >
>> > > > > > hadoop jar $sysDir/target/SystemML.jar -f
>> > > > > > $sysDir/scripts/algorithms/Univar-Stats.dml -nvargs
>> > > > > > X=$baseDirHDFS/original-coded.csv
>> > > > > > TYPES=$baseDirHDFS/original-coded-type.csv
>> > > > > > STATS=$baseDirHDFS/univariate-summary.csv
>> > > > > >
>> > > > > > 16/02/04 20:35:03 INFO api.DMLScript: BEGIN DML run 02/04/2016
>> > > 20:35:03
>> > > > > > 16/02/04 20:35:03 INFO api.DMLScript: HADOOP_HOME: null
>> > > > > > 16/02/04 20:35:03 WARN conf.DMLConfig: No default SystemML
>> config
>> > > file
>> > > > > > (./SystemML-config.xml) found
>> > > > > > 16/02/04 20:35:03 WARN conf.DMLConfig: Using default settings
in
>> > > > > DMLConfig
>> > > > > > 16/02/04 20:35:04 WARN hops.OptimizerUtils: Auto-disable
>> > > multi-threaded
>> > > > > > text read for 'text' and 'csv' due to thread contention
on JRE
<
>> > 1.8
>> > > > > > (java.version=1.7.0_71).
>> > > > > > SLF4J: Class path contains multiple SLF4J bindings.
>> > > > > > SLF4J: Found binding in
>> > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
[jar:file:/usr/lib/zookeeper/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]

>> > > > > > SLF4J: Found binding in
>> > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
[jar:file:/usr/local/explorys/datagrid/lib/slf4j-jdk14-1.6.6.jar!/org/slf4j/impl/StaticLoggerBinder.class]

>> > > > > > SLF4J: Found binding in
>> > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
[jar:file:/usr/local/explorys/datagrid/lib/logback-classic-1.0.6.jar!/org/slf4j/impl/StaticLoggerBinder.class]

>> > > > > > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings
>>  for
>> > an
>> > > > > > explanation.
>> > > > > > 16/02/04 20:35:07 INFO api.DMLScript: SystemML Statistics:
>> > > > > > Total execution time:        0.880 sec.
>> > > > > > Number of executed MR Jobs:    0.
>> > > > > >
>> > > > > > 16/02/04 20:35:07 INFO api.DMLScript: END DML run 02/04/2016
>> > 20:35:07
>> > > > > > Exception in thread "main" java.lang.NoSuchMethodError:
>> > > > > > org.apache.hadoop.mapred.JobConf.getDouble
(Ljava/lang/String;D)D
>> > > > > >    at
>> > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
org.apache.sysml.runtime.matrix.mapred.MRJobConfiguration.setUpMultipleInputs
(MRJobConfiguration.java:1195)
>> > > > > >    at
>> > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
org.apache.sysml.runtime.matrix.mapred.MRJobConfiguration.setUpMultipleInputs
(MRJobConfiguration.java:1129)
>> > > > > >    at
>> > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.sysml.runtime.matrix.CSVReblockMR.runAssignRowIDMRJob
(CSVReblockMR.java:307)
>> > > > > >    at
>> > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.sysml.runtime.matrix.CSVReblockMR.runAssignRowIDMRJob
(CSVReblockMR.java:289)
>> > > > > >    at
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.sysml.runtime.matrix.CSVReblockMR.runJob
(CSVReblockMR.java:275)
>> > > > > >    at
>> > > > >
>> org.apache.sysml.lops.runtime.RunMRJobs.submitJob(RunMRJobs.java:257)
>> > > > > >    at
>> > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.sysml.lops.runtime.RunMRJobs.prepareAndSubmitJob
(RunMRJobs.java:143)
>> > > > > >    at
>> > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
org.apache.sysml.runtime.instructions.MRJobInstruction.processInstruction
(MRJobInstruction.java:1500)
>> > > > > >    at
>> > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
org.apache.sysml.runtime.controlprogram.ProgramBlock.executeSingleInstruction
(ProgramBlock.java:309)
>> > > > > >    at
>> > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.sysml.runtime.controlprogram.ProgramBlock.executeInstructions
(ProgramBlock.java:227)
>> > > > > >    at
>> > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.sysml.runtime.controlprogram.ProgramBlock.execute
(ProgramBlock.java:169)
>> > > > > >    at
>> > > > > >
>> > > >
>> >
>> org.apache.sysml.runtime.controlprogram.Program.execute
(Program.java:146)
>> > > > > >    at org.apache.sysml.api.DMLScript.execute
(DMLScript.java:676)
>> > > > > >    at
>> > > org.apache.sysml.api.DMLScript.executeScript(DMLScript.java:338)
>> > > > > >    at org.apache.sysml.api.DMLScript.main(DMLScript.java:197)
>> > > > > >    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
>> Method)
>> > > > > >    at
>> > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> sun.reflect.NativeMethodAccessorImpl.invoke
(NativeMethodAccessorImpl.java:57)
>> > > > > >    at
>> > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> sun.reflect.DelegatingMethodAccessorImpl.invoke
(DelegatingMethodAccessorImpl.java:43)
>> > > > > >    at java.lang.reflect.Method.invoke(Method.java:606)
>> > > > > >    at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > >
>> > >
>> > >
>> > >
>> >
>> >
>>
>>
>>
>


Mime
  • Unnamed multipart/related (inline, None, 0 bytes)
View raw message