Return-Path: X-Original-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 0BD8DF2E3 for ; Fri, 29 Mar 2013 23:57:19 +0000 (UTC) Received: (qmail 20221 invoked by uid 500); 29 Mar 2013 23:57:13 -0000 Delivered-To: apmail-hadoop-hdfs-user-archive@hadoop.apache.org Received: (qmail 19889 invoked by uid 500); 29 Mar 2013 23:57:13 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 19882 invoked by uid 99); 29 Mar 2013 23:57:13 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 29 Mar 2013 23:57:13 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of azuryyyu@gmail.com designates 209.85.210.178 as permitted sender) Received: from [209.85.210.178] (HELO mail-ia0-f178.google.com) (209.85.210.178) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 29 Mar 2013 23:57:07 +0000 Received: by mail-ia0-f178.google.com with SMTP id r13so764261iar.9 for ; Fri, 29 Mar 2013 16:56:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:cc:content-type; bh=Kk62nMaZGAVi+PDx7xflha7vZuOLrRBhVnOeb7wNGag=; b=Wwt2K47Fb1GKpEJDnjWk3dT/woBsK+vjajjlJcmOHFKBFjUNVayvHRP5wh1Hzk1qmb I7ok0D90V+ilRHT3yCHKH0o0po56q5V98ZSrsHaRMzr1aiJagxcdJMiRHEMokvqsr1wa ca/Ch5LSvYLq5IvfokbczuiXebpR0r0TU9goqSbL51JSNVS3s2jaiesSHOA2DeyJ4vy3 KrEXzxH+DN68vL1RL+/O1iomUpzITYPVamyFxFw113Aqlqi1MhGjl0c4OnZzRQ24Wma7 RfpGRKXysxvPc1DO3urvR4/WzvFd4ZH+RRLpx9IPZeOLmqKVOaKlESyOMAojx01afbjc NFtw== MIME-Version: 1.0 X-Received: by 10.50.20.69 with SMTP id l5mr4745ige.106.1364601406295; Fri, 29 Mar 2013 16:56:46 -0700 (PDT) Received: by 10.64.26.70 with HTTP; Fri, 29 Mar 2013 16:56:46 -0700 (PDT) Received: by 10.64.26.70 with HTTP; Fri, 29 Mar 2013 16:56:46 -0700 (PDT) In-Reply-To: References: Date: Sat, 30 Mar 2013 07:56:46 +0800 Message-ID: Subject: Re: FileSystem Error From: Azuryy Yu To: user@hadoop.apache.org Cc: user@mahout.apache.org Content-Type: multipart/alternative; boundary=047d7bd76ae615154604d9190627 X-Virus-Checked: Checked by ClamAV on apache.org --047d7bd76ae615154604d9190627 Content-Type: text/plain; charset=ISO-8859-1 using haddop jar, instead of java -jar. hadoop script can set a proper classpath for you. On Mar 29, 2013 11:55 PM, "Cyril Bogus" wrote: > Hi, > > I am running a small java program that basically write a small input data > to the Hadoop FileSystem, run a Mahout Canopy and Kmeans Clustering and > then output the content of the data. > > In my hadoop.properties I have included the core-site.xml definition for > the Java program to connect to my single node setup so that I will not use > the Java Project file system but hadoop instead (Basically all write and > read are done on hadoop and not in the class file). > > When I run the program, as soon as the Canopy (even the KMeans), > configuration tries to lookup for the file in the class path instead of the > Hadoop FileSystem path where the proper files are located. > > Is there a problem with the way I have my conf defined? > > hadoop.properties: > fs.default.name=hdfs//mylocation > > Program: > > public class DataFileWriter { > > private static Properties props = new Properties(); > private static Configuration conf = new Configuration(); > > /** > * @param args > * @throws ClassNotFoundException > * @throws InterruptedException > * @throws IOException > */ > public static void main(String[] args) throws IOException, > InterruptedException, ClassNotFoundException { > > props.load(new FileReader(new File( > "/home/cyril/workspace/Newer/src/hadoop.properties"))); > > // TODO Auto-generated method stub > FileSystem fs = null; > SequenceFile.Writer writer; > SequenceFile.Reader reader; > > conf.set("fs.default.name", props.getProperty("fs.default.name")); > > List vectors = new LinkedList(); > NamedVector v1 = new NamedVector(new DenseVector(new double[] { > 0.1, > 0.2, 0.5 }), "Hello"); > vectors.add(v1); > v1 = new NamedVector(new DenseVector(new double[] { 0.5, 0.1, 0.2 > }), > "Bored"); > vectors.add(v1); > v1 = new NamedVector(new DenseVector(new double[] { 0.2, 0.5, 0.1 > }), > "Done"); > vectors.add(v1); > // Write the data to SequenceFile > try { > fs = FileSystem.get(conf); > > Path path = new Path("testdata_seq/data"); > writer = new SequenceFile.Writer(fs, conf, path, Text.class, > VectorWritable.class); > > VectorWritable vec = new VectorWritable(); > for (NamedVector vector : vectors) { > vec.set(vector); > writer.append(new Text(vector.getName()), vec); > } > writer.close(); > > } catch (Exception e) { > System.out.println("ERROR: " + e); > } > > Path input = new Path("testdata_seq/data"); > boolean runSequential = false; > Path clustersOut = new Path("testdata_seq/clusters"); > Path clustersIn = new > Path("testdata_seq/clusters/clusters-0-final"); > double convergenceDelta = 0; > double clusterClassificationThreshold = 0; > boolean runClustering = true; > Path output = new Path("testdata_seq/output"); > int maxIterations = 12; > CanopyDriver.run(conf, input, clustersOut, new > EuclideanDistanceMeasure(), 1, 1, 1, 1, 0, runClustering, > clusterClassificationThreshold, runSequential); > KMeansDriver.run(conf, input, clustersIn, output, new > EuclideanDistanceMeasure(), convergenceDelta, maxIterations, runClustering, > clusterClassificationThreshold, runSequential); > > reader = new SequenceFile.Reader(fs, > new Path("testdata_seq/clusteredPoints/part-m-00000"), > conf); > > IntWritable key = new IntWritable(); > WeightedVectorWritable value = new WeightedVectorWritable(); > while (reader.next(key, value)) { > System.out.println(value.toString() + " belongs to cluster " > + key.toString()); > } > } > > } > > Error Output: > > ....... > 13/03/29 11:47:15 ERROR security.UserGroupInformation: > PriviledgedActionException as:cyril > cause:org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input > path does not exist: file:/home/cyril/workspace/Newer/testdata_seq/data > Exception in thread "main" > org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path > does not exist: file:/home/cyril/workspace/Newer/testdata_seq/data > at > org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:235) > at > org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:55) > at > org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:252) > at > org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:962) > at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:979) > at org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:174) > at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:897) > at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:850) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:416) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121) > at > org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:850) > at org.apache.hadoop.mapreduce.Job.submit(Job.java:500) > at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:530) > at > org.apache.mahout.clustering.classify.ClusterClassificationDriver.classifyClusterMR(ClusterClassificationDriver.java:275) > at > org.apache.mahout.clustering.classify.ClusterClassificationDriver.run(ClusterClassificationDriver.java:135) > at > org.apache.mahout.clustering.canopy.CanopyDriver.clusterData(CanopyDriver.java:372) > at > org.apache.mahout.clustering.canopy.CanopyDriver.run(CanopyDriver.java:158) > at DataFileWriter.main(DataFileWriter.java:85) > > > > > On another note. Is there a command that would allow the program to > overwrite existing files in the filesystem (I would get errors if I don't > delete the files before running the program again). > > Thank you for a reply and I hope I have given all the necessary output. In > the meantime I will look into it. > > Cyril > --047d7bd76ae615154604d9190627 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable

using=A0 haddop jar, instead of java -jar.

hadoop script can set a proper classpath for you.

On Mar 29, 2013 11:55 PM, "Cyril Bogus"= ; <cyrilbogus@gmail.com> = wrote:
Hi,=

I am running a small java program that basically write a smal= l input data to the Hadoop FileSystem, run a Mahout Canopy and Kmeans Clust= ering and then output the content of the data.

In my hadoop.properties I have included the core-site.xml definit= ion for the Java program to connect to my single node setup so that I will = not use the Java Project file system but hadoop instead (Basically all writ= e and read are done on hadoop and not in the class file).

When I run the program, as soon as the Canopy (even the KMeans), = configuration tries to lookup for the file in the class path instead of the= Hadoop FileSystem path where the proper files are located.

Is there a problem with the way I have my conf defined?

= hadoop.properties:
fs.default.name=3Dhdfs//mylocation

Program:

p= ublic class DataFileWriter {

=A0=A0=A0 private static Properties props =3D new Properties();
=A0=A0=A0 private static Configuration conf =3D new Configuration();=
=A0=A0=A0
=A0=A0=A0 /**
=A0=A0=A0 =A0* @param args=A0=A0=A0 =A0* @throws ClassNotFoundException
=A0=A0=A0 =A0* @throws InterruptedException
=A0=A0=A0 =A0* @throws IOExc= eption
=A0=A0=A0 =A0*/
=A0=A0=A0 public static void main(String[] arg= s) throws IOException,
=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 InterruptedExceptio= n, ClassNotFoundException {

=A0=A0=A0 =A0=A0=A0 props.load(new FileR= eader(new File(
=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 "/home/cyril/workspace/Newer/s= rc/hadoop.properties")));

=A0=A0=A0 =A0=A0=A0 // TODO Auto-gene= rated method stub
=A0=A0=A0 =A0=A0=A0 FileSystem fs =3D null;
=A0=A0= =A0 =A0=A0=A0 SequenceFile.Writer writer;
=A0=A0=A0 =A0=A0=A0 SequenceFi= le.Reader reader;
=A0=A0=A0 =A0=A0
=A0=A0=A0 =A0=A0=A0 conf.set("fs.default.name", props.getPrope= rty("fs.default.n= ame"));

=A0=A0=A0 =A0=A0=A0 List<NamedVector> vectors= =3D new LinkedList<NamedVector>();
=A0=A0=A0 =A0=A0=A0 NamedVector v1 =3D new NamedVector(new DenseVector(new = double[] { 0.1,
=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 0.2, 0.5 }), &qu= ot;Hello");
=A0=A0=A0 =A0=A0=A0 vectors.add(v1);
=A0=A0=A0 =A0= =A0=A0 v1 =3D new NamedVector(new DenseVector(new double[] { 0.5, 0.1, 0.2 = }),
=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 "Bored");
=A0=A0=A0 = =A0=A0=A0 vectors.add(v1);
=A0=A0=A0 =A0=A0=A0 v1 =3D new NamedVector(ne= w DenseVector(new double[] { 0.2, 0.5, 0.1 }),
=A0=A0=A0 =A0=A0=A0 =A0= =A0=A0 =A0=A0=A0 "Done");
=A0=A0=A0 =A0=A0=A0 vectors.add(v1);=
=A0=A0=A0 =A0=A0=A0 // Write the data to SequenceFile
=A0=A0=A0 =A0=A0=A0 try {
=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 fs =3D FileSyste= m.get(conf);

=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 Path path =3D new Path(&q= uot;testdata_seq/data");
=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 writer =3D n= ew SequenceFile.Writer(fs, conf, path, Text.class,
=A0=A0=A0 =A0=A0=A0 = =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 VectorWritable.class);

=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 VectorWritable vec =3D new VectorWritable= ();
=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 for (NamedVector vector : vectors) {=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 vec.set(vector);
=A0=A0=A0 =A0= =A0=A0 =A0=A0=A0 =A0=A0=A0 writer.append(new Text(vector.getName()), vec);<= br>=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 }
=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 writer.close();

=A0=A0=A0 =A0=A0=A0 } = catch (Exception e) {
=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 System.out.println(&= quot;ERROR: " + e);
=A0=A0=A0 =A0=A0=A0 }

=A0=A0=A0 =A0=A0= =A0 Path input =3D new Path("testdata_seq/data");
=A0=A0=A0 = =A0=A0=A0 boolean runSequential =3D false;
=A0=A0=A0 =A0=A0=A0 Path clustersOut =3D new Path("testdata_seq/cluste= rs");
=A0=A0=A0 =A0=A0=A0 Path clustersIn =3D new Path("testda= ta_seq/clusters/clusters-0-final");
=A0=A0=A0 =A0=A0=A0 double conv= ergenceDelta =3D 0;
=A0=A0=A0 =A0=A0=A0 double clusterClassificationThre= shold =3D 0;
=A0=A0=A0 =A0=A0=A0 boolean runClustering =3D true;
=A0=A0=A0 =A0=A0=A0 = Path output =3D new Path("testdata_seq/output");
=A0=A0=A0 =A0= =A0=A0 int maxIterations =3D 12;
=A0=A0=A0 =A0=A0=A0 CanopyDriver.run(co= nf, input, clustersOut, new EuclideanDistanceMeasure(), 1, 1, 1, 1, 0, runC= lustering, clusterClassificationThreshold, runSequential);
=A0=A0=A0 =A0=A0=A0 KMeansDriver.run(conf, input, clustersIn, output, new E= uclideanDistanceMeasure(), convergenceDelta, maxIterations, runClustering, = clusterClassificationThreshold, runSequential);
=A0=A0=A0
=A0=A0=A0 = =A0=A0=A0 reader =3D new SequenceFile.Reader(fs,
=A0=A0=A0 =A0=A0=A0 =A0=A0=A0=A0=A0=A0=A0 new Path("testdata_seq/clust= eredPoints/part-m-00000"), conf);
=A0=A0=A0 =A0=A0=A0
=A0=A0=A0= =A0=A0=A0 IntWritable key =3D new IntWritable();
=A0=A0=A0 =A0=A0=A0 We= ightedVectorWritable value =3D new WeightedVectorWritable();
=A0=A0=A0 =A0=A0=A0 while (reader.next(key, value)) {
=A0=A0=A0 =A0=A0= =A0=A0=A0 System.out.println(value.toString() + " belongs to cluster &= quot;
=A0=A0=A0 =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0 + key.toString());
=A0=A0=A0 =A0=A0=A0 }
=A0=A0=A0= }

}

Error Output:

.......
13/03/29 11:47:15 ERROR security.UserG= roupInformation: PriviledgedActionException as:cyril cause:org.apache.hadoo= p.mapreduce.lib.input.InvalidInputException: Input path does not exist: fil= e:/home/cyril/workspace/Newer/testdata_seq/data
Exception in thread "main" org.apache.hadoop.mapreduce.lib.input.= InvalidInputException: Input path does not exist: file:/home/cyril/workspac= e/Newer/testdata_seq/data
=A0=A0=A0 at org.apache.hadoop.mapreduce.lib.i= nput.FileInputFormat.listStatus(FileInputFormat.java:235)
=A0=A0=A0 at org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat.= listStatus(SequenceFileInputFormat.java:55)
=A0=A0=A0 at org.apache.hado= op.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:252)<= br>=A0=A0=A0 at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient= .java:962)
=A0=A0=A0 at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:= 979)
=A0=A0=A0 at org.apache.hadoop.mapred.JobClient.access$600(JobClien= t.java:174)
=A0=A0=A0 at org.apache.hadoop.mapred.JobClient$2.run(JobCli= ent.java:897)
=A0=A0=A0 at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:850)=A0=A0=A0 at java.security.AccessController.doPrivileged(Native Method)=A0=A0=A0 at javax.security.auth.Subject.doAs(Subject.java:416)
=A0=A0= =A0 at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInform= ation.java:1121)
=A0=A0=A0 at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient= .java:850)
=A0=A0=A0 at org.apache.hadoop.mapreduce.Job.submit(Job.java:= 500)
=A0=A0=A0 at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.= java:530)
=A0=A0=A0 at org.apache.mahout.clustering.classify.ClusterClassificationDri= ver.classifyClusterMR(ClusterClassificationDriver.java:275)
=A0=A0=A0 at org.apache.mahout.clustering.classify.ClusterClassificationDri= ver.run(ClusterClassificationDriver.java:135)
=A0=A0=A0 at org.apache.ma= hout.clustering.canopy.CanopyDriver.clusterData(CanopyDriver.java:372)
= =A0=A0=A0 at org.apache.mahout.clustering.canopy.CanopyDriver.run(CanopyDri= ver.java:158)
=A0=A0=A0 at DataFileWriter.main(DataFileWriter.java:85)

=


On another note. Is there a command that would allow the = program to overwrite existing files in the filesystem (I would get errors i= f I don't delete the files before running the program again).

Thank you for a reply and I hope I have given all the necessary o= utput. In the meantime I will look into it.

Cyril
--047d7bd76ae615154604d9190627--