Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 40029200AE4 for ; Thu, 26 May 2016 02:43:16 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 3E4DC160A34; Thu, 26 May 2016 00:43:16 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 37860160A29 for ; Thu, 26 May 2016 02:43:15 +0200 (CEST) Received: (qmail 94346 invoked by uid 500); 26 May 2016 00:43:13 -0000 Mailing-List: contact dev-help@mahout.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@mahout.apache.org Delivered-To: mailing list dev@mahout.apache.org Received: (qmail 93984 invoked by uid 99); 26 May 2016 00:43:13 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 26 May 2016 00:43:13 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id F2EE52C1F62 for ; Thu, 26 May 2016 00:43:12 +0000 (UTC) Date: Thu, 26 May 2016 00:43:12 +0000 (UTC) From: "Albert Chu (JIRA)" To: dev@mahout.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (MAHOUT-1863) cluster-syntheticcontrol.sh errors out with "Input path does not exist" MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Thu, 26 May 2016 00:43:16 -0000 [ https://issues.apache.org/jira/browse/MAHOUT-1863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Albert Chu updated MAHOUT-1863: ------------------------------- Description: Running cluster-syntheticcontrol.sh on 0.12.0 resulted in this error: {noformat} Exception in thread "main" org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: hdfs://apex156:54310/user/achu/testdata at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:323) at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:265) at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:387) at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:301) at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:318) at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:196) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1308) at org.apache.mahout.clustering.conversion.InputDriver.runJob(InputDriver.java:108) at org.apache.mahout.clustering.syntheticcontrol.fuzzykmeans.Job.run(Job.java:133) at org.apache.mahout.clustering.syntheticcontrol.fuzzykmeans.Job.main(Job.java:62) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71) at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144) at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:152) at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) {noformat} It appears cluster-syntheticcontrol.sh breaks under 0.12.0 due to patch {noformat} commit 23267a0bef064f3351fd879274724bcb02333c4a {noformat} one change in question {noformat} - $DFS -mkdir testdata + $DFS -mkdir ${WORK_DIR}/testdata {noformat} now requires that the -p option be specified to -mkdir. This fix is simple. Another change: {noformat} - $DFS -put ${WORK_DIR}/synthetic_control.data testdata + $DFS -put ${WORK_DIR}/synthetic_control.data ${WORK_DIR}/testdata {noformat} appears to break the example b/c in: examples/src/main/java/org/apache/mahout/clustering/syntheticcontrol/fuzzykmeans/Job.java examples/src/main/java/org/apache/mahout/clustering/syntheticcontrol/kmeans/Job.java the file 'testdata' is hard coded into the example as just 'testdata'. $${WORK_DIR}/testdata needs to be passed in as an option. Reverting the lines listed above fixes the problem. However, the reverting presumably breaks the original problem listed in MAHOUT-1773. I originally attempted to fix this by simply passing in the option "--input ${WORK_DIR}/testdata" into the command in the script. However, a number of other options are required if one option is specified. I considered modifying the above Job.java files to take a minimal number of arguments and set the rest to some default, but that would have also required changes to DefaultOptionCreator.java to make required options non-optional, which I didn't want to go down the path of determining what other examples had requires/non-requires requirements. So I just passed in every required option into cluster-syntheticcontrol.sh to fix this, using whatever defaults were hard coded into the Job.java files above. I'm sure there's a better way to do this, and I'm happy to supply a patch, but thought I'd start with this. Github pull request to be sent shortly. was: Running cluster-syntheticcontrol.sh on 0.12.0 resulted in this error: {noformat} Exception in thread "main" org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: hdfs://apex156:54310/user/achu/testdata at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:323) at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:265) at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:387) at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:301) at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:318) at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:196) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1308) at org.apache.mahout.clustering.conversion.InputDriver.runJob(InputDriver.java:108) at org.apache.mahout.clustering.syntheticcontrol.fuzzykmeans.Job.run(Job.java:133) at org.apache.mahout.clustering.syntheticcontrol.fuzzykmeans.Job.main(Job.java:62) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71) at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144) at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:152) at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) {noformat} It appears cluster-syntheticcontrol.sh breaks under 0.12.0 due to patch {noformat} commit 23267a0bef064f3351fd879274724bcb02333c4a {noformat} one change in question {noformat} - $DFS -mkdir testdata + $DFS -mkdir ${WORK_DIR}/testdata {noformat} now requires that the -p option be specified to -mkdir. This fix is simple. Another change: {noformat} - $DFS -put ${WORK_DIR}/synthetic_control.data testdata + $DFS -put ${WORK_DIR}/synthetic_control.data ${WORK_DIR}/testdata {noformat} appears to break the example b/c in: examples/src/main/java/org/apache/mahout/clustering/syntheticcontrol/fuzzykmeans/Job.java examples/src/main/java/org/apache/mahout/clustering/syntheticcontrol/kmeans/Job.java the file 'testdata' is hard coded into the example as just 'testdata'. ${WORK_DIR}/testdata needs to be passed in as an option. Reverting the lines listed above fixes the problem. However, the reverting presumably breaks the original problem listed in MAHOUT-1773. I originally attempted to fix this by simply passing in the option "--input ${WORK_DIR}/testdata" into the command in the script. However, a number of other options are required if one option is specified. I considered modifying the above Job.java files to take a minimal number of arguments and set the rest to some default, but that would have also required changes to DefaultOptionCreator.java to make required options non-optional, which I didn't want to go down the path of determining what other examples had requires/non-requires requirements. So I just passed in every required option into cluster-syntheticcontrol.sh to fix this, using whatever defaults were hard coded into the Job.java files above. I'm sure there's a better way to do this, and I'm happy to supply a patch, but thought I'd start with this. Github pull request to be sent shortly. > cluster-syntheticcontrol.sh errors out with "Input path does not exist" > ----------------------------------------------------------------------- > > Key: MAHOUT-1863 > URL: https://issues.apache.org/jira/browse/MAHOUT-1863 > Project: Mahout > Issue Type: Bug > Affects Versions: 0.12.0 > Reporter: Albert Chu > Priority: Minor > > Running cluster-syntheticcontrol.sh on 0.12.0 resulted in this error: > {noformat} > Exception in thread "main" org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: hdfs://apex156:54310/user/achu/testdata > at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:323) > at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:265) > at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:387) > at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:301) > at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:318) > at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:196) > at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290) > at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) > at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287) > at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1308) > at org.apache.mahout.clustering.conversion.InputDriver.runJob(InputDriver.java:108) > at org.apache.mahout.clustering.syntheticcontrol.fuzzykmeans.Job.run(Job.java:133) > at org.apache.mahout.clustering.syntheticcontrol.fuzzykmeans.Job.main(Job.java:62) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71) > at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144) > at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:152) > at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.hadoop.util.RunJar.run(RunJar.java:221) > at org.apache.hadoop.util.RunJar.main(RunJar.java:136) > {noformat} > It appears cluster-syntheticcontrol.sh breaks under 0.12.0 due to patch > {noformat} > commit 23267a0bef064f3351fd879274724bcb02333c4a > {noformat} > one change in question > {noformat} > - $DFS -mkdir testdata > + $DFS -mkdir ${WORK_DIR}/testdata > {noformat} > now requires that the -p option be specified to -mkdir. This fix is simple. > Another change: > {noformat} > - $DFS -put ${WORK_DIR}/synthetic_control.data testdata > + $DFS -put ${WORK_DIR}/synthetic_control.data ${WORK_DIR}/testdata > {noformat} > appears to break the example b/c in: > examples/src/main/java/org/apache/mahout/clustering/syntheticcontrol/fuzzykmeans/Job.java > examples/src/main/java/org/apache/mahout/clustering/syntheticcontrol/kmeans/Job.java > the file 'testdata' is hard coded into the example as just 'testdata'. $${WORK_DIR}/testdata needs to be passed in as an option. > Reverting the lines listed above fixes the problem. However, the reverting presumably breaks the original problem listed in MAHOUT-1773. > I originally attempted to fix this by simply passing in the option "--input ${WORK_DIR}/testdata" into the command in the script. However, a number of other options are required if one option is specified. > I considered modifying the above Job.java files to take a minimal number of arguments and set the rest to some default, but that would have also required changes to DefaultOptionCreator.java to make required options non-optional, which I didn't want to go down the path of determining what other examples had requires/non-requires requirements. > So I just passed in every required option into cluster-syntheticcontrol.sh to fix this, using whatever defaults were hard coded into the Job.java files above. > I'm sure there's a better way to do this, and I'm happy to supply a patch, but thought I'd start with this. > Github pull request to be sent shortly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)