hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Azuryy Yu <azury...@gmail.com>
Subject Re: FileSystem Error
Date Fri, 29 Mar 2013 23:56:46 GMT
using  haddop jar, instead of java -jar.

hadoop script can set a proper classpath for you.
On Mar 29, 2013 11:55 PM, "Cyril Bogus" <cyrilbogus@gmail.com> wrote:

> Hi,
>
> I am running a small java program that basically write a small input data
> to the Hadoop FileSystem, run a Mahout Canopy and Kmeans Clustering and
> then output the content of the data.
>
> In my hadoop.properties I have included the core-site.xml definition for
> the Java program to connect to my single node setup so that I will not use
> the Java Project file system but hadoop instead (Basically all write and
> read are done on hadoop and not in the class file).
>
> When I run the program, as soon as the Canopy (even the KMeans),
> configuration tries to lookup for the file in the class path instead of the
> Hadoop FileSystem path where the proper files are located.
>
> Is there a problem with the way I have my conf defined?
>
> hadoop.properties:
> fs.default.name=hdfs//mylocation
>
> Program:
>
> public class DataFileWriter {
>
>     private static Properties props = new Properties();
>     private static Configuration conf = new Configuration();
>
>     /**
>      * @param args
>      * @throws ClassNotFoundException
>      * @throws InterruptedException
>      * @throws IOException
>      */
>     public static void main(String[] args) throws IOException,
>             InterruptedException, ClassNotFoundException {
>
>         props.load(new FileReader(new File(
>                 "/home/cyril/workspace/Newer/src/hadoop.properties")));
>
>         // TODO Auto-generated method stub
>         FileSystem fs = null;
>         SequenceFile.Writer writer;
>         SequenceFile.Reader reader;
>
>         conf.set("fs.default.name", props.getProperty("fs.default.name"));
>
>         List<NamedVector> vectors = new LinkedList<NamedVector>();
>         NamedVector v1 = new NamedVector(new DenseVector(new double[] {
> 0.1,
>                 0.2, 0.5 }), "Hello");
>         vectors.add(v1);
>         v1 = new NamedVector(new DenseVector(new double[] { 0.5, 0.1, 0.2
> }),
>                 "Bored");
>         vectors.add(v1);
>         v1 = new NamedVector(new DenseVector(new double[] { 0.2, 0.5, 0.1
> }),
>                 "Done");
>         vectors.add(v1);
>         // Write the data to SequenceFile
>         try {
>             fs = FileSystem.get(conf);
>
>             Path path = new Path("testdata_seq/data");
>             writer = new SequenceFile.Writer(fs, conf, path, Text.class,
>                     VectorWritable.class);
>
>             VectorWritable vec = new VectorWritable();
>             for (NamedVector vector : vectors) {
>                 vec.set(vector);
>                 writer.append(new Text(vector.getName()), vec);
>             }
>             writer.close();
>
>         } catch (Exception e) {
>             System.out.println("ERROR: " + e);
>         }
>
>         Path input = new Path("testdata_seq/data");
>         boolean runSequential = false;
>         Path clustersOut = new Path("testdata_seq/clusters");
>         Path clustersIn = new
> Path("testdata_seq/clusters/clusters-0-final");
>         double convergenceDelta = 0;
>         double clusterClassificationThreshold = 0;
>         boolean runClustering = true;
>         Path output = new Path("testdata_seq/output");
>         int maxIterations = 12;
>         CanopyDriver.run(conf, input, clustersOut, new
> EuclideanDistanceMeasure(), 1, 1, 1, 1, 0, runClustering,
> clusterClassificationThreshold, runSequential);
>         KMeansDriver.run(conf, input, clustersIn, output, new
> EuclideanDistanceMeasure(), convergenceDelta, maxIterations, runClustering,
> clusterClassificationThreshold, runSequential);
>
>         reader = new SequenceFile.Reader(fs,
>                 new Path("testdata_seq/clusteredPoints/part-m-00000"),
> conf);
>
>         IntWritable key = new IntWritable();
>         WeightedVectorWritable value = new WeightedVectorWritable();
>         while (reader.next(key, value)) {
>           System.out.println(value.toString() + " belongs to cluster "
>                              + key.toString());
>         }
>     }
>
> }
>
> Error Output:
>
> .......
> 13/03/29 11:47:15 ERROR security.UserGroupInformation:
> PriviledgedActionException as:cyril
> cause:org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input
> path does not exist: file:/home/cyril/workspace/Newer/testdata_seq/data
> Exception in thread "main"
> org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path
> does not exist: file:/home/cyril/workspace/Newer/testdata_seq/data
>     at
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:235)
>     at
> org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:55)
>     at
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:252)
>     at
> org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:962)
>     at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:979)
>     at org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:174)
>     at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:897)
>     at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:850)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at javax.security.auth.Subject.doAs(Subject.java:416)
>     at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
>     at
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:850)
>     at org.apache.hadoop.mapreduce.Job.submit(Job.java:500)
>     at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:530)
>     at
> org.apache.mahout.clustering.classify.ClusterClassificationDriver.classifyClusterMR(ClusterClassificationDriver.java:275)
>     at
> org.apache.mahout.clustering.classify.ClusterClassificationDriver.run(ClusterClassificationDriver.java:135)
>     at
> org.apache.mahout.clustering.canopy.CanopyDriver.clusterData(CanopyDriver.java:372)
>     at
> org.apache.mahout.clustering.canopy.CanopyDriver.run(CanopyDriver.java:158)
>     at DataFileWriter.main(DataFileWriter.java:85)
>
>
>
>
> On another note. Is there a command that would allow the program to
> overwrite existing files in the filesystem (I would get errors if I don't
> delete the files before running the program again).
>
> Thank you for a reply and I hope I have given all the necessary output. In
> the meantime I will look into it.
>
> Cyril
>

Mime
View raw message