Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
Received-SPF: pass (nike.apache.org: domain of azuryyyu@gmail.com designates
 209.85.210.178 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CAMNoYbP3Vz+Coj-13f0NRAbVYLaM6ywo2k6AZo0f9idEQhQQdw@mail.gmail.com>
References: 
 <CAMNoYbP3Vz+Coj-13f0NRAbVYLaM6ywo2k6AZo0f9idEQhQQdw@mail.gmail.com>
Date: Sat, 30 Mar 2013 07:56:46 +0800
Message-ID: 
 <CALr1C9rxWwVS_K2xJvV7S_Y=7Tzy1Z55hP=jSgdTs6zYeT=BNQ@mail.gmail.com>
Subject: Re: FileSystem Error
From: Azuryy Yu <azuryyyu@gmail.com>
To: user@hadoop.apache.org
Cc: user@mahout.apache.org
Content-Type: multipart/alternative; boundary=047d7bd76ae615154604d9190627

--047d7bd76ae615154604d9190627
Content-Type: text/plain; charset=ISO-8859-1

using  haddop jar, instead of java -jar.

hadoop script can set a proper classpath for you.
On Mar 29, 2013 11:55 PM, "Cyril Bogus" <cyrilbogus@gmail.com> wrote:

> Hi,
>
> I am running a small java program that basically write a small input data
> to the Hadoop FileSystem, run a Mahout Canopy and Kmeans Clustering and
> then output the content of the data.
>
> In my hadoop.properties I have included the core-site.xml definition for
> the Java program to connect to my single node setup so that I will not use
> the Java Project file system but hadoop instead (Basically all write and
> read are done on hadoop and not in the class file).
>
> When I run the program, as soon as the Canopy (even the KMeans),
> configuration tries to lookup for the file in the class path instead of the
> Hadoop FileSystem path where the proper files are located.
>
> Is there a problem with the way I have my conf defined?
>
> hadoop.properties:
> fs.default.name=hdfs//mylocation
>
> Program:
>
> public class DataFileWriter {
>
>     private static Properties props = new Properties();
>     private static Configuration conf = new Configuration();
>
>     /**
>      * @param args
>      * @throws ClassNotFoundException
>      * @throws InterruptedException
>      * @throws IOException
>      */
>     public static void main(String[] args) throws IOException,
>             InterruptedException, ClassNotFoundException {
>
>         props.load(new FileReader(new File(
>                 "/home/cyril/workspace/Newer/src/hadoop.properties")));
>
>         // TODO Auto-generated method stub
>         FileSystem fs = null;
>         SequenceFile.Writer writer;
>         SequenceFile.Reader reader;
>
>         conf.set("fs.default.name", props.getProperty("fs.default.name"));
>
>         List<NamedVector> vectors = new LinkedList<NamedVector>();
>         NamedVector v1 = new NamedVector(new DenseVector(new double[] {
> 0.1,
>                 0.2, 0.5 }), "Hello");
>         vectors.add(v1);
>         v1 = new NamedVector(new DenseVector(new double[] { 0.5, 0.1, 0.2
> }),
>                 "Bored");
>         vectors.add(v1);
>         v1 = new NamedVector(new DenseVector(new double[] { 0.2, 0.5, 0.1
> }),
>                 "Done");
>         vectors.add(v1);
>         // Write the data to SequenceFile
>         try {
>             fs = FileSystem.get(conf);
>
>             Path path = new Path("testdata_seq/data");
>             writer = new SequenceFile.Writer(fs, conf, path, Text.class,
>                     VectorWritable.class);
>
>             VectorWritable vec = new VectorWritable();
>             for (NamedVector vector : vectors) {
>                 vec.set(vector);
>                 writer.append(new Text(vector.getName()), vec);
>             }
>             writer.close();
>
>         } catch (Exception e) {
>             System.out.println("ERROR: " + e);
>         }
>
>         Path input = new Path("testdata_seq/data");
>         boolean runSequential = false;
>         Path clustersOut = new Path("testdata_seq/clusters");
>         Path clustersIn = new
> Path("testdata_seq/clusters/clusters-0-final");
>         double convergenceDelta = 0;
>         double clusterClassificationThreshold = 0;
>         boolean runClustering = true;
>         Path output = new Path("testdata_seq/output");
>         int maxIterations = 12;
>         CanopyDriver.run(conf, input, clustersOut, new
> EuclideanDistanceMeasure(), 1, 1, 1, 1, 0, runClustering,
> clusterClassificationThreshold, runSequential);
>         KMeansDriver.run(conf, input, clustersIn, output, new
> EuclideanDistanceMeasure(), convergenceDelta, maxIterations, runClustering,
> clusterClassificationThreshold, runSequential);
>
>         reader = new SequenceFile.Reader(fs,
>                 new Path("testdata_seq/clusteredPoints/part-m-00000"),
> conf);
>
>         IntWritable key = new IntWritable();
>         WeightedVectorWritable value = new WeightedVectorWritable();
>         while (reader.next(key, value)) {
>           System.out.println(value.toString() + " belongs to cluster "
>                              + key.toString());
>         }
>     }
>
> }
>
> Error Output:
>
> .......
> 13/03/29 11:47:15 ERROR security.UserGroupInformation:
> PriviledgedActionException as:cyril
> cause:org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input
> path does not exist: file:/home/cyril/workspace/Newer/testdata_seq/data
> Exception in thread "main"
> org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path
> does not exist: file:/home/cyril/workspace/Newer/testdata_seq/data
>     at
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:235)
>     at
> org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:55)
>     at
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:252)
>     at
> org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:962)
>     at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:979)
>     at org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:174)
>     at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:897)
>     at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:850)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at javax.security.auth.Subject.doAs(Subject.java:416)
>     at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
>     at
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:850)
>     at org.apache.hadoop.mapreduce.Job.submit(Job.java:500)
>     at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:530)
>     at
> org.apache.mahout.clustering.classify.ClusterClassificationDriver.classifyClusterMR(ClusterClassificationDriver.java:275)
>     at
> org.apache.mahout.clustering.classify.ClusterClassificationDriver.run(ClusterClassificationDriver.java:135)
>     at
> org.apache.mahout.clustering.canopy.CanopyDriver.clusterData(CanopyDriver.java:372)
>     at
> org.apache.mahout.clustering.canopy.CanopyDriver.run(CanopyDriver.java:158)
>     at DataFileWriter.main(DataFileWriter.java:85)
>
>
>
>
> On another note. Is there a command that would allow the program to
> overwrite existing files in the filesystem (I would get errors if I don't
> delete the files before running the program again).
>
> Thank you for a reply and I hope I have given all the necessary output. In
> the meantime I will look into it.
>
> Cyril
>

--047d7bd76ae615154604d9190627
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

<p dir=3D"ltr">using=A0 haddop jar, instead of java -jar.</p>
<p dir=3D"ltr">hadoop script can set a proper classpath for you.</p>
<div class=3D"gmail_quote">On Mar 29, 2013 11:55 PM, &quot;Cyril Bogus&quot=
; &lt;<a href=3D"mailto:cyrilbogus@gmail.com">cyrilbogus@gmail.com</a>&gt; =
wrote:<br type=3D"attribution"><blockquote class=3D"gmail_quote" style=3D"m=
argin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div dir=3D"ltr"><div><div><div><div><div><div><div><div><div><div><div>Hi,=
<br><br></div>I am running a small java program that basically write a smal=
l input data to the Hadoop FileSystem, run a Mahout Canopy and Kmeans Clust=
ering and then output the content of the data.<br>

<br></div>In my hadoop.properties I have included the core-site.xml definit=
ion for the Java program to connect to my single node setup so that I will =
not use the Java Project file system but hadoop instead (Basically all writ=
e and read are done on hadoop and not in the class file).<br>

<br></div>When I run the program, as soon as the Canopy (even the KMeans), =
configuration tries to lookup for the file in the class path instead of the=
 Hadoop FileSystem path where the proper files are located. <br></div>
<br>
</div>Is there a problem with the way I have my conf defined?<br><br></div>=
hadoop.properties:<br></div><a href=3D"http://fs.default.name" target=3D"_b=
lank">fs.default.name</a>=3Dhdfs//mylocation<br><br></div>Program:<br><br>p=
ublic class DataFileWriter {<br>

<br>=A0=A0=A0 private static Properties props =3D new Properties();<br></di=
v><div>=A0=A0=A0 private static Configuration conf =3D new Configuration();=
<br></div><div>=A0=A0=A0 <br>=A0=A0=A0 /**<br>=A0=A0=A0 =A0* @param args<br=
>=A0=A0=A0 =A0* @throws ClassNotFoundException<br>

=A0=A0=A0 =A0* @throws InterruptedException<br>=A0=A0=A0 =A0* @throws IOExc=
eption<br>=A0=A0=A0 =A0*/<br>=A0=A0=A0 public static void main(String[] arg=
s) throws IOException,<br>=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 InterruptedExceptio=
n, ClassNotFoundException {<br><br>=A0=A0=A0 =A0=A0=A0 props.load(new FileR=
eader(new File(<br>

=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 &quot;/home/cyril/workspace/Newer/s=
rc/hadoop.properties&quot;)));<br><br>=A0=A0=A0 =A0=A0=A0 // TODO Auto-gene=
rated method stub<br>=A0=A0=A0 =A0=A0=A0 FileSystem fs =3D null;<br>=A0=A0=
=A0 =A0=A0=A0 SequenceFile.Writer writer;<br>=A0=A0=A0 =A0=A0=A0 SequenceFi=
le.Reader reader;<br>

=A0=A0=A0 =A0=A0 <br>=A0=A0=A0 =A0=A0=A0 conf.set(&quot;<a href=3D"http://f=
s.default.name" target=3D"_blank">fs.default.name</a>&quot;, props.getPrope=
rty(&quot;<a href=3D"http://fs.default.name" target=3D"_blank">fs.default.n=
ame</a>&quot;));<br><br>=A0=A0=A0 =A0=A0=A0 List&lt;NamedVector&gt; vectors=
 =3D new LinkedList&lt;NamedVector&gt;();<br>

=A0=A0=A0 =A0=A0=A0 NamedVector v1 =3D new NamedVector(new DenseVector(new =
double[] { 0.1,<br>=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 0.2, 0.5 }), &qu=
ot;Hello&quot;);<br>=A0=A0=A0 =A0=A0=A0 vectors.add(v1);<br>=A0=A0=A0 =A0=
=A0=A0 v1 =3D new NamedVector(new DenseVector(new double[] { 0.5, 0.1, 0.2 =
}),<br>

=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 &quot;Bored&quot;);<br>=A0=A0=A0 =
=A0=A0=A0 vectors.add(v1);<br>=A0=A0=A0 =A0=A0=A0 v1 =3D new NamedVector(ne=
w DenseVector(new double[] { 0.2, 0.5, 0.1 }),<br>=A0=A0=A0 =A0=A0=A0 =A0=
=A0=A0 =A0=A0=A0 &quot;Done&quot;);<br>=A0=A0=A0 =A0=A0=A0 vectors.add(v1);=
<br>=A0=A0=A0 =A0=A0=A0 // Write the data to SequenceFile<br>

=A0=A0=A0 =A0=A0=A0 try {<br>=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 fs =3D FileSyste=
m.get(conf);<br><br>=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 Path path =3D new Path(&q=
uot;testdata_seq/data&quot;);<br>=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 writer =3D n=
ew SequenceFile.Writer(fs, conf, path, Text.class,<br>=A0=A0=A0 =A0=A0=A0 =
=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 VectorWritable.class);<br>

<br>=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 VectorWritable vec =3D new VectorWritable=
();<br>=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 for (NamedVector vector : vectors) {<b=
r>=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 vec.set(vector);<br>=A0=A0=A0 =A0=
=A0=A0 =A0=A0=A0 =A0=A0=A0 writer.append(new Text(vector.getName()), vec);<=
br>=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 }<br>

=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 writer.close();<br><br>=A0=A0=A0 =A0=A0=A0 } =
catch (Exception e) {<br>=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 System.out.println(&=
quot;ERROR: &quot; + e);<br>=A0=A0=A0 =A0=A0=A0 }<br><br>=A0=A0=A0 =A0=A0=
=A0 Path input =3D new Path(&quot;testdata_seq/data&quot;);<br>=A0=A0=A0 =
=A0=A0=A0 boolean runSequential =3D false;<br>

=A0=A0=A0 =A0=A0=A0 Path clustersOut =3D new Path(&quot;testdata_seq/cluste=
rs&quot;);<br>=A0=A0=A0 =A0=A0=A0 Path clustersIn =3D new Path(&quot;testda=
ta_seq/clusters/clusters-0-final&quot;);<br>=A0=A0=A0 =A0=A0=A0 double conv=
ergenceDelta =3D 0;<br>=A0=A0=A0 =A0=A0=A0 double clusterClassificationThre=
shold =3D 0;<br>

=A0=A0=A0 =A0=A0=A0 boolean runClustering =3D true;<br>=A0=A0=A0 =A0=A0=A0 =
Path output =3D new Path(&quot;testdata_seq/output&quot;);<br>=A0=A0=A0 =A0=
=A0=A0 int maxIterations =3D 12;<br>=A0=A0=A0 =A0=A0=A0 CanopyDriver.run(co=
nf, input, clustersOut, new EuclideanDistanceMeasure(), 1, 1, 1, 1, 0, runC=
lustering, clusterClassificationThreshold, runSequential);<br>

=A0=A0=A0 =A0=A0=A0 KMeansDriver.run(conf, input, clustersIn, output, new E=
uclideanDistanceMeasure(), convergenceDelta, maxIterations, runClustering, =
clusterClassificationThreshold, runSequential);<br>=A0=A0=A0 <br>=A0=A0=A0 =
=A0=A0=A0 reader =3D new SequenceFile.Reader(fs,<br>

=A0=A0=A0 =A0=A0=A0 =A0=A0=A0=A0=A0=A0=A0 new Path(&quot;testdata_seq/clust=
eredPoints/part-m-00000&quot;), conf);<br>=A0=A0=A0 =A0=A0=A0 <br>=A0=A0=A0=
 =A0=A0=A0 IntWritable key =3D new IntWritable();<br>=A0=A0=A0 =A0=A0=A0 We=
ightedVectorWritable value =3D new WeightedVectorWritable();<br>

=A0=A0=A0 =A0=A0=A0 while (reader.next(key, value)) {<br>=A0=A0=A0 =A0=A0=
=A0=A0=A0 System.out.println(value.toString() + &quot; belongs to cluster &=
quot;<br>=A0=A0=A0 =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=
=A0=A0=A0=A0=A0=A0 + key.toString());<br>=A0=A0=A0 =A0=A0=A0 }<br>=A0=A0=A0=
 }<br><br>}<br><br></div>

<div>Error Output:<br><br>.......<br>13/03/29 11:47:15 ERROR security.UserG=
roupInformation: PriviledgedActionException as:cyril cause:org.apache.hadoo=
p.mapreduce.lib.input.InvalidInputException: Input path does not exist: fil=
e:/home/cyril/workspace/Newer/testdata_seq/data<br>

Exception in thread &quot;main&quot; org.apache.hadoop.mapreduce.lib.input.=
InvalidInputException: Input path does not exist: file:/home/cyril/workspac=
e/Newer/testdata_seq/data<br>=A0=A0=A0 at org.apache.hadoop.mapreduce.lib.i=
nput.FileInputFormat.listStatus(FileInputFormat.java:235)<br>

=A0=A0=A0 at org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat.=
listStatus(SequenceFileInputFormat.java:55)<br>=A0=A0=A0 at org.apache.hado=
op.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:252)<=
br>=A0=A0=A0 at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient=
.java:962)<br>

=A0=A0=A0 at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:=
979)<br>=A0=A0=A0 at org.apache.hadoop.mapred.JobClient.access$600(JobClien=
t.java:174)<br>=A0=A0=A0 at org.apache.hadoop.mapred.JobClient$2.run(JobCli=
ent.java:897)<br>

=A0=A0=A0 at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:850)<b=
r>=A0=A0=A0 at java.security.AccessController.doPrivileged(Native Method)<b=
r>=A0=A0=A0 at javax.security.auth.Subject.doAs(Subject.java:416)<br>=A0=A0=
=A0 at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInform=
ation.java:1121)<br>

=A0=A0=A0 at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient=
.java:850)<br>=A0=A0=A0 at org.apache.hadoop.mapreduce.Job.submit(Job.java:=
500)<br>=A0=A0=A0 at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.=
java:530)<br>
=A0=A0=A0 at org.apache.mahout.clustering.classify.ClusterClassificationDri=
ver.classifyClusterMR(ClusterClassificationDriver.java:275)<br>
=A0=A0=A0 at org.apache.mahout.clustering.classify.ClusterClassificationDri=
ver.run(ClusterClassificationDriver.java:135)<br>=A0=A0=A0 at org.apache.ma=
hout.clustering.canopy.CanopyDriver.clusterData(CanopyDriver.java:372)<br>=
=A0=A0=A0 at org.apache.mahout.clustering.canopy.CanopyDriver.run(CanopyDri=
ver.java:158)<br>

=A0=A0=A0 at DataFileWriter.main(DataFileWriter.java:85)<br><br></div><div>=
<br><br><br></div>On another note. Is there a command that would allow the =
program to overwrite existing files in the filesystem (I would get errors i=
f I don&#39;t delete the files before running the program again).<br>

<br></div>Thank you for a reply and I hope I have given all the necessary o=
utput. In the meantime I will look into it.<br><br></div>Cyril<br></div>
</blockquote></div>

--047d7bd76ae615154604d9190627--