mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAHOUT-1863) cluster-syntheticcontrol.sh errors out with "Input path does not exist"
Date Thu, 26 May 2016 16:28:13 GMT

    [ https://issues.apache.org/jira/browse/MAHOUT-1863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15302340#comment-15302340
] 

ASF GitHub Bot commented on MAHOUT-1863:
----------------------------------------

Github user andrewpalumbo commented on the pull request:

    https://github.com/apache/mahout/pull/235#issuecomment-221923050
  
    Oh yes- I'd forgotten that we're allowing for user-defined directories now in these scripts
so we don't know how long the path will be.  So that simple alternative won't work without
as you mentioned a loop- these scripts are already complicated enough.. We've discussed tearing
them down completely and re-doing them but haven't had a chance (Would you be interested?
:))  
    
    So I'll have to test this out but I'm for committing this as is.  It needs to at least
be working on hadoop 2.   
    
    We can then look at getting all the scripts back to hadoop 1 compatible later.


> cluster-syntheticcontrol.sh errors out with "Input path does not exist"
> -----------------------------------------------------------------------
>
>                 Key: MAHOUT-1863
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1863
>             Project: Mahout
>          Issue Type: Bug
>    Affects Versions: 0.12.0
>            Reporter: Albert Chu
>            Priority: Minor
>
> Running cluster-syntheticcontrol.sh on 0.12.0 resulted in this error:
> {noformat}
> Exception in thread "main" org.apache.hadoop.mapreduce.lib.input.InvalidInputException:
Input path does not exist: hdfs://apex156:54310/user/achu/testdata
> 	at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:323)
> 	at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:265)
> 	at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:387)
> 	at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:301)
> 	at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:318)
> 	at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:196)
> 	at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
> 	at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:415)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
> 	at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
> 	at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1308)
> 	at org.apache.mahout.clustering.conversion.InputDriver.runJob(InputDriver.java:108)
> 	at org.apache.mahout.clustering.syntheticcontrol.fuzzykmeans.Job.run(Job.java:133)
> 	at org.apache.mahout.clustering.syntheticcontrol.fuzzykmeans.Job.main(Job.java:62)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:606)
> 	at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71)
> 	at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
> 	at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:152)
> 	at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:606)
> 	at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
> 	at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
> {noformat}
> It appears cluster-syntheticcontrol.sh breaks under 0.12.0 due to patch
> {noformat}
> commit 23267a0bef064f3351fd879274724bcb02333c4a
> {noformat}
> one change in question
> {noformat}
> -    $DFS -mkdir testdata
> +    $DFS -mkdir ${WORK_DIR}/testdata
> {noformat}
> now requires that the -p option be specified to -mkdir. This fix is simple.
> Another change:
> {noformat}
> -    $DFS -put ${WORK_DIR}/synthetic_control.data testdata
> +    $DFS -put ${WORK_DIR}/synthetic_control.data ${WORK_DIR}/testdata
> {noformat}
> appears to break the example b/c in:
> examples/src/main/java/org/apache/mahout/clustering/syntheticcontrol/fuzzykmeans/Job.java
> examples/src/main/java/org/apache/mahout/clustering/syntheticcontrol/kmeans/Job.java
> the file 'testdata' is hard coded into the example as just 'testdata'. ${WORK_DIR}/testdata
needs to be passed in as an option.
> Reverting the lines listed above fixes the problem.  However, the reverting presumably
breaks the original problem listed in MAHOUT-1773.
> I originally attempted to fix this by simply passing in the option "--input ${WORK_DIR}/testdata"
into the command in the script. However, a number of other options are required if one option
is specified.
> I considered modifying the above Job.java files to take a minimal number of arguments
and set the rest to some default, but that would have also required changes to DefaultOptionCreator.java
to make required options non-optional, which I didn't want to go down the path of determining
what other examples had requires/non-requires requirements.
> So I just passed in every required option into cluster-syntheticcontrol.sh to fix this,
using whatever defaults were hard coded into the Job.java files above.
> I'm sure there's a better way to do this, and I'm happy to supply a patch, but thought
I'd start with this.  
> Github pull request to be sent shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message