systemml-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ethan Xu <ethan.yifa...@gmail.com>
Subject Re: Compatibility with MR1 Cloudera cdh4.2.1
Date Sat, 06 Feb 2016 04:04:11 GMT
Seems it's a problem of

String taskType = (conf.getBoolean(JobContext.TASK_ISMAP, true)) ? "m" : "r"
;

on line 137 of

org.apache.sysml.runtime.matrix.data.MultipleOutputCommitter

conf.getBoolean(JobContext.TASK_ISMAP, true) returns 'true' (perhaps using
provided default value) even in reducers
causing (line 139) 'charIx' to get value -1 that leads to the error

Confirmed on my test case. When error occurs in reducer, 'name' has value
'0-r-00000', but 'taskType' has value 'm'.

The following (ugly) hack fixed the problem. DML ran successfully
afterwards.

Change line 137 - 139 of
    org.apache.sysml.runtime.matrix.data.MultipleOutputCommitter
to:

String name =  file.getName();
Pattern p = Pattern.compile("-[rm]-");
Matcher m = p.matcher(name);
int charIx = 0;
if (m.find()) {
    charIx = m.start();
}else{
    throw new RuntimeException("file name :" + name + "doesn't contain 'r'
or 'm'");
}

Ethan


On Fri, Feb 5, 2016 at 4:37 PM, Ethan Xu <ethan.yifanxu@gmail.com> wrote:

> Thanks tried that and moved a bit further. Now a new exception (still in
> reduce phase of 'CSV-Reblock-MR'):
>
> WARN org.apache.hadoop.mapred.Child: Error running child
> java.lang.StringIndexOutOfBoundsException: String index out of range: -1
> 	at java.lang.String.substring(String.java:1911)
> 	at org.apache.sysml.runtime.matrix.data.MultipleOutputCommitter.moveFileToDestination(MultipleOutputCommitter.java:140)
> 	at org.apache.sysml.runtime.matrix.data.MultipleOutputCommitter.moveFinalTaskOutputs(MultipleOutputCommitter.java:119)
> 	at org.apache.sysml.runtime.matrix.data.MultipleOutputCommitter.commitTask(MultipleOutputCommitter.java:94)
> 	at org.apache.hadoop.mapred.OutputCommitter.commitTask(OutputCommitter.java:221)
> 	at org.apache.hadoop.mapred.Task.commit(Task.java:1005)
> 	at org.apache.hadoop.mapred.Task.done(Task.java:875)
> 	at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:453)
> 	at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:415)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
> 	at org.apache.hadoop.mapred.Child.main(Child.java:262)
>
>
>
> On Fri, Feb 5, 2016 at 4:03 PM, Matthias Boehm <mboehm@us.ibm.com> wrote:
>
>> ok that is interesting. I think the following is happening: The hadoop
>> version is >2.0, which makes SystemML switch to the 2.x configuration
>> properties. However, because MR1 is bundled into this distribution these
>> configurations do not exist which makes us fail on processing task ids.
>>
>> Workaround: Change
>> org.apache.sysml.runtime.matrix.mapred.MRConfigurationNames line 85 to
>> *"boolean* hadoopVersion2 = false".
>>
>> Regards,
>> Matthias
>>
>> [image: Inactive hide details for Ethan Xu ---02/05/2016 12:36:27
>> PM---Thank you very much. I just pulled the update, rebuilt the proje]Ethan
>> Xu ---02/05/2016 12:36:27 PM---Thank you very much. I just pulled the
>> update, rebuilt the project and reran the code.
>>
>> From: Ethan Xu <ethan.yifanxu@gmail.com>
>> To: dev@systemml.incubator.apache.org
>> Date: 02/05/2016 12:36 PM
>> Subject: Re: Compatibility with MR1 Cloudera cdh4.2.1
>> ------------------------------
>>
>>
>>
>> Thank you very much. I just pulled the update, rebuilt the project and
>> reran the code.
>>
>> The method-not-found error was gone, and the MapReduce job was kicked off.
>> The 'Assign-RowID-MR' job finished successfully.
>> The map phase of 'CSV-Reblock-MR' job finished, but reducers threw
>> NullPointerExceptions at
>>
>> java.lang.NullPointerException
>> at
>> org.apache.sysml.runtime.matrix.mapred.ReduceBase.close(ReduceBase.java:205)
>> at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:516)
>> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:447)
>> at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
>> at java.security.AccessController.doPrivileged(Native Method)
>> at javax.security.auth.Subject.doAs(Subject.java:415)
>> at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
>> at org.apache.hadoop.mapred.Child.main(Child.java:262)
>>
>> The job I ran was the same as before on the same data:
>> hadoop jar <SystemML dir>/target/SystemML.jar -libjars <local
>> dir>/hadoop-lzo-0.4.15.jar -f <SystemML
>> dir>/scripts/algorithms/Univar-Stats.dml -nvargs X=<HDFS
>> dir>/original-coded.csv TYPES=<HDFS dir>/original-coded-type.csv
>> STATS=<HDFS dir>/univariate-summary.csv
>>
>> The hadoop cluster was also the same one: CDH4.2.1.
>>
>> Sorry for keep coming back with problems on a really old hadoop system.
>> Please let me know what other information is needed to diagnose the issue.
>>
>> Ethan
>>
>>
>> On Fri, Feb 5, 2016 at 1:26 PM, Deron Eriksson <deroneriksson@gmail.com>
>> wrote:
>>
>> > Hi Ethan,
>> >
>> > I believe your safest, cleanest bet is to wait for the fix from
>> Matthias.
>> > When he pushes the fix, you will see it at
>> > https://github.com/apache/incubator-systemml/commits/master. At that
>> > point,
>> > you can pull (git pull) the changes from GitHub to your machine and then
>> > build with Maven utilizing the new changes.
>> >
>> > Alternatively, it's not really recommended, but you might be able to use
>> > -libjars to reference the hadoop-commons jar, which should be in your
>> local
>> > maven repository
>> >
>> >
>> (.m2/repository/org/apache/hadoop/hadoop-common/2.4.1/hadoop-common-2.4.1.jar).
>> > However, mixing jar versions usually doesn't work very well (it can
>> lead to
>> > other problems), so waiting for the fix is best.
>> >
>> > Deron
>> >
>> >
>> > On Fri, Feb 5, 2016 at 6:47 AM, Ethan Xu <ethan.yifanxu@gmail.com>
>> wrote:
>> >
>> > > Thank you Shirish and Deron for the suggestions. Looking forward to
>> the
>> > fix
>> > > from Matthias!
>> > >
>> > > We are using the hadoop-common shipped with CDH4.2.1, and it's in
>> > > classpath. I'm a bit hesitate to alter our hadoop configuration to
>> > include
>> > > other versions since other people are using it too.
>> > >
>> > > Not sure if/how the following naive approach affects the program
>> > behavior,
>> > > but I did try changing the scope of
>> > >
>> > > <groupId>org.apache.hadoop</groupId>
>> > > <artifactId>hadoop-common</artifactId>
>> > > <version>${hadoop.version}</version>
>> > >
>> > > in SystemML's pom.xml from 'provided' to 'compile' and rebuilt the jar
>> > > (21MB), and it threw the same error.
>> > >
>> > > By the way this is in pom.xml line 65 - 72:
>> > > <properties>
>> > >           <hadoop.version>2.4.1</hadoop.version>
>> > >           <antlr.version>4.3</antlr.version>
>> > >           <spark.version>1.4.1</spark.version>
>> > >
>> > >                 <!-- OS-specific JVM arguments for running integration
>> > > tests -->
>> > >                 <integrationTestExtraJVMArgs />
>> > > </properties>
>> > >
>> > > Am I supposed to modify the hadoop.version before build?
>> > >
>> > > Thanks again,
>> > >
>> > > Ethan
>> > >
>> > >
>> > >
>> > > On Fri, Feb 5, 2016 at 2:29 AM, Deron Eriksson <
>> deroneriksson@gmail.com>
>> > > wrote:
>> > >
>> > > > Hi Matthias,
>> > > >
>> > > > Glad to hear the fix is simple. Mixing jar versions sometimes is not
>> > very
>> > > > fun.
>> > > >
>> > > > Deron
>> > > >
>> > > >
>> > > > On Thu, Feb 4, 2016 at 11:10 PM, Matthias Boehm <mboehm@us.ibm.com>
>> > > wrote:
>> > > >
>> > > > > well, let's not mix different hadoop versions in the class path
or
>> > > > > client/server. If I'm not mistaken, cdh 4.x always shipped with
MR
>> > v1.
>> > > > It's
>> > > > > a trivial fix for us and will be in the repo tomorrow morning
>> anyway.
>> > > > > Thanks for catching this issue Ethan.
>> > > > >
>> > > > > Regards,
>> > > > > Matthias
>> > > > >
>> > > > > [image: Inactive hide details for Deron Eriksson ---02/04/2016
>> > 11:04:38
>> > > > > PM---Hi Ethan, Just FYI, I looked at
>> hadoop-common-2.0.0-cdh4.2]Deron
>> > > > > Eriksson ---02/04/2016 11:04:38 PM---Hi Ethan, Just FYI, I looked
>> at
>> > > > > hadoop-common-2.0.0-cdh4.2.1.jar (
>> > > > >
>> > > > > From: Deron Eriksson <deroneriksson@gmail.com>
>> > > > > To: dev@systemml.incubator.apache.org
>> > > > > Date: 02/04/2016 11:04 PM
>> > > > > Subject: Re: Compatibility with MR1 Cloudera cdh4.2.1
>> > > > > ------------------------------
>> > > > >
>> > > > >
>> > > > >
>> > > > > Hi Ethan,
>> > > > >
>> > > > > Just FYI, I looked at hadoop-common-2.0.0-cdh4.2.1.jar (
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://repository.cloudera.com/artifactory/repo/org/apache/hadoop/hadoop-common/2.0.0-cdh4.2.1/
>> > > > > ),
>> > > > > since I don't see a 2.0.0-mr1-cdh4.2.1 version, and the
>> > > > > org.apache.hadoop.conf.Configuration class in that jar doesn't
>> appear
>> > > to
>> > > > > have a getDouble method, so using that version of hadoop-common
>> won't
>> > > > work.
>> > > > >
>> > > > > However, the hadoop-common-2.4.1.jar (
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://repository.cloudera.com/artifactory/repo/org/apache/hadoop/hadoop-common/2.4.1/
>> > > > > )
>> > > > >
>> > > > > does appear to have the getDouble method. It's possible that
>> adding
>> > > that
>> > > > > jar to your classpath may fix your problem, as Shirish pointed
>> out.
>> > > > >
>> > > > > It sounds like Matthias may have another fix.
>> > > > >
>> > > > > Deron
>> > > > >
>> > > > >
>> > > > >
>> > > > > On Thu, Feb 4, 2016 at 6:40 PM, Matthias Boehm <mboehm@us.ibm.com
>> >
>> > > > wrote:
>> > > > >
>> > > > > > well, we did indeed not run on MR v1 for a while now. However,
I
>> > > don't
>> > > > > > want to get that far and say we don't support it anymore.
I'll
>> fix
>> > > this
>> > > > > > particular issue by tomorrow.
>> > > > > >
>> > > > > > In the next couple of weeks we should run our full performance
>> > > > testsuite
>> > > > > > (for broad coverage) over an MR v1 cluster and systematically
>> > remove
>> > > > > > unnecessary incompatibility like this instance. Any volunteers?
>> > > > > >
>> > > > > > Regards,
>> > > > > > Matthias
>> > > > > >
>> > > > > > [image: Inactive hide details for Ethan Xu ---02/04/2016
>> 05:51:28
>> > > > > > PM---Hello, I got an error when running the
>> > > > > systemML/scripts/Univar-S]Ethan
>> > > > > > Xu ---02/04/2016 05:51:28 PM---Hello, I got an error when
>> running
>> > the
>> > > > > > systemML/scripts/Univar-Stats.dml script on
>> > > > > >
>> > > > > > From: Ethan Xu <ethan.yifanxu@gmail.com>
>> > > > > > To: dev@systemml.incubator.apache.org
>> > > > > > Date: 02/04/2016 05:51 PM
>> > > > > > Subject: Compatibility with MR1 Cloudera cdh4.2.1
>> > > > > > ------------------------------
>> > > > >
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > > Hello,
>> > > > > >
>> > > > > > I got an error when running the
>> systemML/scripts/Univar-Stats.dml
>> > > > script
>> > > > > on
>> > > > > > a hadoop cluster (Cloudera CDH4.2.1) on a 6GB data set.
Error
>> > message
>> > > > is
>> > > > > at
>> > > > > > the bottom of the email. The same script ran fine on a smaller
>> > sample
>> > > > > > (several MB) of the same data set, when MR was not invoked.
>> > > > > >
>> > > > > > The main error was java.lang.NoSuchMethodError:
>> > > > > > org.apache.hadoop.mapred.JobConf.getDouble()
>> > > > > > Digging deeper, it looks like the CDH4.2.1 version of MR
indeed
>> > > didn't
>> > > > > have
>> > > > > > the JobConf.getDouble() method.
>> > > > > >
>> > > > > > The hadoop-core jar of CDH4.2.1 can be found here:
>> > > > > >
>> > > > > >
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://repository.cloudera.com/artifactory/repo/org/apache/hadoop/hadoop-core/2.0.0-mr1-cdh4.2.1/
>> > > > >
>> > > > > >
>> > > > > > The calling line of SystemML is line 1194 of
>> > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://github.com/apache/incubator-systemml/blob/master/src/main/java/org/apache/sysml/runtime/matrix/mapred/MRJobConfiguration.java
>> > > > > >
>> > > > > > I was wondering, if the finding is accurate, is there a
>> potential
>> > > fix,
>> > > > or
>> > > > > > does this mean the current version of SystemML is not compatible
>> > with
>> > > > > > CDH4.2.1?
>> > > > > >
>> > > > > > Thank you,
>> > > > > >
>> > > > > > Ethan
>> > > > > >
>> > > > > >
>> > > > > > hadoop jar $sysDir/target/SystemML.jar -f
>> > > > > > $sysDir/scripts/algorithms/Univar-Stats.dml -nvargs
>> > > > > > X=$baseDirHDFS/original-coded.csv
>> > > > > > TYPES=$baseDirHDFS/original-coded-type.csv
>> > > > > > STATS=$baseDirHDFS/univariate-summary.csv
>> > > > > >
>> > > > > > 16/02/04 20:35:03 INFO api.DMLScript: BEGIN DML run 02/04/2016
>> > > 20:35:03
>> > > > > > 16/02/04 20:35:03 INFO api.DMLScript: HADOOP_HOME: null
>> > > > > > 16/02/04 20:35:03 WARN conf.DMLConfig: No default SystemML
>> config
>> > > file
>> > > > > > (./SystemML-config.xml) found
>> > > > > > 16/02/04 20:35:03 WARN conf.DMLConfig: Using default settings
in
>> > > > > DMLConfig
>> > > > > > 16/02/04 20:35:04 WARN hops.OptimizerUtils: Auto-disable
>> > > multi-threaded
>> > > > > > text read for 'text' and 'csv' due to thread contention
on JRE <
>> > 1.8
>> > > > > > (java.version=1.7.0_71).
>> > > > > > SLF4J: Class path contains multiple SLF4J bindings.
>> > > > > > SLF4J: Found binding in
>> > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> [jar:file:/usr/lib/zookeeper/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>> > > > > > SLF4J: Found binding in
>> > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> [jar:file:/usr/local/explorys/datagrid/lib/slf4j-jdk14-1.6.6.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>> > > > > > SLF4J: Found binding in
>> > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> [jar:file:/usr/local/explorys/datagrid/lib/logback-classic-1.0.6.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>> > > > > > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings
>>  for
>> > an
>> > > > > > explanation.
>> > > > > > 16/02/04 20:35:07 INFO api.DMLScript: SystemML Statistics:
>> > > > > > Total execution time:        0.880 sec.
>> > > > > > Number of executed MR Jobs:    0.
>> > > > > >
>> > > > > > 16/02/04 20:35:07 INFO api.DMLScript: END DML run 02/04/2016
>> > 20:35:07
>> > > > > > Exception in thread "main" java.lang.NoSuchMethodError:
>> > > > > > org.apache.hadoop.mapred.JobConf.getDouble(Ljava/lang/String;D)D
>> > > > > >    at
>> > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.sysml.runtime.matrix.mapred.MRJobConfiguration.setUpMultipleInputs(MRJobConfiguration.java:1195)
>> > > > > >    at
>> > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.sysml.runtime.matrix.mapred.MRJobConfiguration.setUpMultipleInputs(MRJobConfiguration.java:1129)
>> > > > > >    at
>> > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.sysml.runtime.matrix.CSVReblockMR.runAssignRowIDMRJob(CSVReblockMR.java:307)
>> > > > > >    at
>> > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.sysml.runtime.matrix.CSVReblockMR.runAssignRowIDMRJob(CSVReblockMR.java:289)
>> > > > > >    at
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.sysml.runtime.matrix.CSVReblockMR.runJob(CSVReblockMR.java:275)
>> > > > > >    at
>> > > > >
>> org.apache.sysml.lops.runtime.RunMRJobs.submitJob(RunMRJobs.java:257)
>> > > > > >    at
>> > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.sysml.lops.runtime.RunMRJobs.prepareAndSubmitJob(RunMRJobs.java:143)
>> > > > > >    at
>> > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.sysml.runtime.instructions.MRJobInstruction.processInstruction(MRJobInstruction.java:1500)
>> > > > > >    at
>> > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.sysml.runtime.controlprogram.ProgramBlock.executeSingleInstruction(ProgramBlock.java:309)
>> > > > > >    at
>> > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.sysml.runtime.controlprogram.ProgramBlock.executeInstructions(ProgramBlock.java:227)
>> > > > > >    at
>> > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.sysml.runtime.controlprogram.ProgramBlock.execute(ProgramBlock.java:169)
>> > > > > >    at
>> > > > > >
>> > > >
>> >
>> org.apache.sysml.runtime.controlprogram.Program.execute(Program.java:146)
>> > > > > >    at org.apache.sysml.api.DMLScript.execute(DMLScript.java:676)
>> > > > > >    at
>> > > org.apache.sysml.api.DMLScript.executeScript(DMLScript.java:338)
>> > > > > >    at org.apache.sysml.api.DMLScript.main(DMLScript.java:197)
>> > > > > >    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
>> Method)
>> > > > > >    at
>> > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>> > > > > >    at
>> > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> > > > > >    at java.lang.reflect.Method.invoke(Method.java:606)
>> > > > > >    at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > >
>> > >
>> > >
>> > >
>> >
>> >
>>
>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message