Mailing-List: contact dev-help@mahout.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@mahout.apache.org
Date: Thu, 26 May 2016 00:43:12 +0000 (UTC)
From: "Albert Chu (JIRA)" <jira@apache.org>
To: dev@mahout.apache.org
Message-ID: <JIRA.12972917.1464223263000.297164.1464223392992@Atlassian.JIRA>
In-Reply-To: <JIRA.12972917.1464223263000@Atlassian.JIRA>
References: <JIRA.12972917.1464223263000@Atlassian.JIRA> <JIRA.12972917.1464223263173@arcas>
Subject: [jira] [Updated] (MAHOUT-1863) cluster-syntheticcontrol.sh errors
 out with "Input path does not exist"
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
archived-at: Thu, 26 May 2016 00:43:16 -0000


     [ https://issues.apache.org/jira/browse/MAHOUT-1863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Albert Chu updated MAHOUT-1863:
-------------------------------
    Description: 
Running cluster-syntheticcontrol.sh on 0.12.0 resulted in this error:

{noformat}
Exception in thread "main" org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: hdfs://apex156:54310/user/achu/testdata
	at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:323)
	at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:265)
	at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:387)
	at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:301)
	at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:318)
	at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:196)
	at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
	at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
	at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
	at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1308)
	at org.apache.mahout.clustering.conversion.InputDriver.runJob(InputDriver.java:108)
	at org.apache.mahout.clustering.syntheticcontrol.fuzzykmeans.Job.run(Job.java:133)
	at org.apache.mahout.clustering.syntheticcontrol.fuzzykmeans.Job.main(Job.java:62)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71)
	at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
	at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:152)
	at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
	at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
{noformat}

It appears cluster-syntheticcontrol.sh breaks under 0.12.0 due to patch

{noformat}
commit 23267a0bef064f3351fd879274724bcb02333c4a
{noformat}

one change in question

{noformat}
-    $DFS -mkdir testdata
+    $DFS -mkdir ${WORK_DIR}/testdata
{noformat}

now requires that the -p option be specified to -mkdir. This fix is simple.

Another change:

{noformat}
-    $DFS -put ${WORK_DIR}/synthetic_control.data testdata
+    $DFS -put ${WORK_DIR}/synthetic_control.data ${WORK_DIR}/testdata
{noformat}

appears to break the example b/c in:

examples/src/main/java/org/apache/mahout/clustering/syntheticcontrol/fuzzykmeans/Job.java
examples/src/main/java/org/apache/mahout/clustering/syntheticcontrol/kmeans/Job.java

the file 'testdata' is hard coded into the example as just 'testdata'. $${WORK_DIR}/testdata needs to be passed in as an option.

Reverting the lines listed above fixes the problem.  However, the reverting presumably breaks the original problem listed in MAHOUT-1773.

I originally attempted to fix this by simply passing in the option "--input ${WORK_DIR}/testdata" into the command in the script. However, a number of other options are required if one option is specified.

I considered modifying the above Job.java files to take a minimal number of arguments and set the rest to some default, but that would have also required changes to DefaultOptionCreator.java to make required options non-optional, which I didn't want to go down the path of determining what other examples had requires/non-requires requirements.

So I just passed in every required option into cluster-syntheticcontrol.sh to fix this, using whatever defaults were hard coded into the Job.java files above.

I'm sure there's a better way to do this, and I'm happy to supply a patch, but thought I'd start with this.  

Github pull request to be sent shortly.


  was:
Running cluster-syntheticcontrol.sh on 0.12.0 resulted in this error:

{noformat}
Exception in thread "main" org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: hdfs://apex156:54310/user/achu/testdata
	at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:323)
	at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:265)
	at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:387)
	at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:301)
	at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:318)
	at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:196)
	at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
	at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
	at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
	at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1308)
	at org.apache.mahout.clustering.conversion.InputDriver.runJob(InputDriver.java:108)
	at org.apache.mahout.clustering.syntheticcontrol.fuzzykmeans.Job.run(Job.java:133)
	at org.apache.mahout.clustering.syntheticcontrol.fuzzykmeans.Job.main(Job.java:62)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71)
	at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
	at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:152)
	at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
	at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
{noformat}

It appears cluster-syntheticcontrol.sh breaks under 0.12.0 due to patch

{noformat}
commit 23267a0bef064f3351fd879274724bcb02333c4a
{noformat}

one change in question

{noformat}
-    $DFS -mkdir testdata
+    $DFS -mkdir ${WORK_DIR}/testdata
{noformat}

now requires that the -p option be specified to -mkdir. This fix is simple.

Another change:

{noformat}
-    $DFS -put ${WORK_DIR}/synthetic_control.data testdata
+    $DFS -put ${WORK_DIR}/synthetic_control.data ${WORK_DIR}/testdata
{noformat}

appears to break the example b/c in:

examples/src/main/java/org/apache/mahout/clustering/syntheticcontrol/fuzzykmeans/Job.java
examples/src/main/java/org/apache/mahout/clustering/syntheticcontrol/kmeans/Job.java

the file 'testdata' is hard coded into the example as just 'testdata'. ${WORK_DIR}/testdata needs to be passed in as an option.

Reverting the lines listed above fixes the problem.  However, the reverting presumably breaks the original problem listed in MAHOUT-1773.

I originally attempted to fix this by simply passing in the option "--input ${WORK_DIR}/testdata" into the command in the script. However, a number of other options are required if one option is specified.

I considered modifying the above Job.java files to take a minimal number of arguments and set the rest to some default, but that would have also required changes to DefaultOptionCreator.java to make required options non-optional, which I didn't want to go down the path of determining what other examples had requires/non-requires requirements.

So I just passed in every required option into cluster-syntheticcontrol.sh to fix this, using whatever defaults were hard coded into the Job.java files above.

I'm sure there's a better way to do this, and I'm happy to supply a patch, but thought I'd start with this.  

Github pull request to be sent shortly.


> cluster-syntheticcontrol.sh errors out with "Input path does not exist"
> -----------------------------------------------------------------------
>
>                 Key: MAHOUT-1863
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1863
>             Project: Mahout
>          Issue Type: Bug
>    Affects Versions: 0.12.0
>            Reporter: Albert Chu
>            Priority: Minor
>
> Running cluster-syntheticcontrol.sh on 0.12.0 resulted in this error:
> {noformat}
> Exception in thread "main" org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: hdfs://apex156:54310/user/achu/testdata
> 	at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:323)
> 	at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:265)
> 	at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:387)
> 	at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:301)
> 	at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:318)
> 	at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:196)
> 	at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
> 	at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:415)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
> 	at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
> 	at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1308)
> 	at org.apache.mahout.clustering.conversion.InputDriver.runJob(InputDriver.java:108)
> 	at org.apache.mahout.clustering.syntheticcontrol.fuzzykmeans.Job.run(Job.java:133)
> 	at org.apache.mahout.clustering.syntheticcontrol.fuzzykmeans.Job.main(Job.java:62)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:606)
> 	at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71)
> 	at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
> 	at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:152)
> 	at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:606)
> 	at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
> 	at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
> {noformat}
> It appears cluster-syntheticcontrol.sh breaks under 0.12.0 due to patch
> {noformat}
> commit 23267a0bef064f3351fd879274724bcb02333c4a
> {noformat}
> one change in question
> {noformat}
> -    $DFS -mkdir testdata
> +    $DFS -mkdir ${WORK_DIR}/testdata
> {noformat}
> now requires that the -p option be specified to -mkdir. This fix is simple.
> Another change:
> {noformat}
> -    $DFS -put ${WORK_DIR}/synthetic_control.data testdata
> +    $DFS -put ${WORK_DIR}/synthetic_control.data ${WORK_DIR}/testdata
> {noformat}
> appears to break the example b/c in:
> examples/src/main/java/org/apache/mahout/clustering/syntheticcontrol/fuzzykmeans/Job.java
> examples/src/main/java/org/apache/mahout/clustering/syntheticcontrol/kmeans/Job.java
> the file 'testdata' is hard coded into the example as just 'testdata'. $${WORK_DIR}/testdata needs to be passed in as an option.
> Reverting the lines listed above fixes the problem.  However, the reverting presumably breaks the original problem listed in MAHOUT-1773.
> I originally attempted to fix this by simply passing in the option "--input ${WORK_DIR}/testdata" into the command in the script. However, a number of other options are required if one option is specified.
> I considered modifying the above Job.java files to take a minimal number of arguments and set the rest to some default, but that would have also required changes to DefaultOptionCreator.java to make required options non-optional, which I didn't want to go down the path of determining what other examples had requires/non-requires requirements.
> So I just passed in every required option into cluster-syntheticcontrol.sh to fix this, using whatever defaults were hard coded into the Job.java files above.
> I'm sure there's a better way to do this, and I'm happy to supply a patch, but thought I'd start with this.  
> Github pull request to be sent shortly.


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)