Return-Path: X-Original-To: apmail-mahout-user-archive@www.apache.org Delivered-To: apmail-mahout-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 12639FB1D for ; Fri, 29 Mar 2013 19:41:01 +0000 (UTC) Received: (qmail 79296 invoked by uid 500); 29 Mar 2013 19:40:59 -0000 Delivered-To: apmail-mahout-user-archive@mahout.apache.org Received: (qmail 79258 invoked by uid 500); 29 Mar 2013 19:40:59 -0000 Mailing-List: contact user-help@mahout.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@mahout.apache.org Delivered-To: mailing list user@mahout.apache.org Received: (qmail 79250 invoked by uid 99); 29 Mar 2013 19:40:59 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 29 Mar 2013 19:40:59 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of dangeorge.filimon@gmail.com designates 74.125.82.180 as permitted sender) Received: from [74.125.82.180] (HELO mail-we0-f180.google.com) (74.125.82.180) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 29 Mar 2013 19:40:52 +0000 Received: by mail-we0-f180.google.com with SMTP id r5so536663wey.39 for ; Fri, 29 Mar 2013 12:40:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=x-received:mime-version:in-reply-to:references:from:date:message-id :subject:to:content-type; bh=xLJgVENcbt8407ypjvPm0ekqraoCAjoAt4+qb46Bgqg=; b=0OQ7K7fVzQU6gfmZ/M77lkUOGYbozGnCjYcTy079ufFQGhy3ky9zdyQHQNJiGRdWDo AoZtXEKOljZ5Lk13+wOxo7meZsG2bwQxHjvNbxYzf8LJkYECSLEfP51qtFXHVTvL+6U6 2eeIUBZ8W111I8KwOKtxW9xRc6cwoifjpB4pe9Y0t2TRy4IoYLq+X39HLstxC8f33Pbj K1Wje1NBUDxLNP1DCnhrwUM57IDgxX4oa5jjsz2R30LV/G39aXs/TlnRS47y15cuww/O MWWT4uauDyvRIjjnR5hbE23GDqJLB3ueyBih5en3dEwLDY8GgtjnwrjAMl/88LlCTjGC /D6A== X-Received: by 10.180.188.3 with SMTP id fw3mr852756wic.33.1364586032329; Fri, 29 Mar 2013 12:40:32 -0700 (PDT) MIME-Version: 1.0 Received: by 10.194.138.146 with HTTP; Fri, 29 Mar 2013 12:39:52 -0700 (PDT) In-Reply-To: References: From: Dan Filimon Date: Fri, 29 Mar 2013 21:39:52 +0200 Message-ID: Subject: Re: FileSystem Error To: user@mahout.apache.org Content-Type: multipart/alternative; boundary=001a11c37cdcb8fc3d04d91571cf X-Virus-Checked: Checked by ClamAV on apache.org --001a11c37cdcb8fc3d04d91571cf Content-Type: text/plain; charset=UTF-8 Happy to help! :) On Fri, Mar 29, 2013 at 9:38 PM, Cyril Bogus wrote: > THANK YOU SO MUCH DAN... > > It even solved another problem I was having with Sqoop who couldn't connect > to the hdfs through Java Programming. > > > On Fri, Mar 29, 2013 at 3:30 PM, Dan Filimon >wrote: > > > Maybe this helps? > > > > > http://www.opensourceconnections.com/2013/03/24/hdfs-debugging-wrong-fs-expected-file-exception/ > > > > > > On Fri, Mar 29, 2013 at 9:27 PM, Cyril Bogus > wrote: > > > > > Kind of saw this coming since I felt like file:/// will be appended but > > > here is the error I get if I do it > > > > > > ERROR: java.lang.IllegalArgumentException: Wrong FS: > > > hdfs://super:54310/user/cyril/testdata_seq, expected: file:/// > > > > > > > > > > > > On Fri, Mar 29, 2013 at 1:27 PM, Dan Filimon < > > dangeorge.filimon@gmail.com > > > >wrote: > > > > > > > One thing that you could try is just using _absolute paths_ > everywhere. > > > So, > > > > something on HDFS is hdfs://... whereas something on your local file > > > system > > > > is file://... > > > > > > > > > > > > On Fri, Mar 29, 2013 at 7:05 PM, Cyril Bogus > > > wrote: > > > > > > > > > Thank you again Chris. > > > > > > > > > > Yes it is a typo. > > > > > > > > > > After careful reading of the output, my program is exactly doing > what > > > you > > > > > describe. > > > > > I am trying to do everything in Hadoop fs but it is creating files > on > > > > both > > > > > hadoop fs and class fs and some files are missing. When I run AND > > copy > > > > the > > > > > missing file from hadoop fs into the class file I get the proper > > > > output(no > > > > > errors). And I also get the proper output when I do everything > within > > > the > > > > > class file (by removing the property of conf). > > > > > > > > > > But I am trying to automate everything to run on my three node > > cluster > > > > for > > > > > testing within java. So I need to be able to do everything on > Hadoop > > > fs. > > > > I > > > > > will look into setting up Mahout for a proper *conf *file. > > > > > > > > > > - Cyril > > > > > > > > > > > > > > > On Fri, Mar 29, 2013 at 12:34 PM, Chris Harrington < > > chris@heystaks.com > > > > > >wrote: > > > > > > > > > > > Well then do all the various folders exist on the hadoop fs? > > > > > > > > > > > > I also had a similar problem awhile ago where my program ran fine > > but > > > > > then > > > > > > I did something (no idea what) and hadoop started complaining. To > > fix > > > > it > > > > > I > > > > > > had to put everything on the hadoop fs. i.e. was move all fs > > > > path > > > > > > to>/data to data > > > > > > > > > > > > One more strange issue I ran into was where I had identically > named > > > > > > folders on both local and hdfs and it was looking in the wrong > one. > > > > > > > > > > > > I think that's all the causes I've run into, so if they're not > the > > > > cause > > > > > > then I'm out of ideas and hopefully someone else will be able to > > > help. > > > > > > > > > > > > also the missing colon is a typo right? hdfs//mylocation > > > > > > > > > > > > On 29 Mar 2013, at 16:09, Cyril Bogus wrote: > > > > > > > > > > > > > Thank you for the reply Chris, > > > > > > > > > > > > > > I create and write fine on the file system. And the file is > there > > > > when > > > > > I > > > > > > > check hadoop. So I do not think the problem is privileges. As I > > > read > > > > > it, > > > > > > > the Canopy Driver is looking for the file under the Class file > > > > > > > (/home/cyrille/DataWriter/src/testdata_seq/) instead of > Hadoop's > > > > > > > (/user/cyrille/) and the file is not there so it gives me the > > error > > > > > that > > > > > > > the file does not exists. But the file exists and was created > > fine > > > > > > "within > > > > > > > the program with the same conf variable" > > > > > > > > > > > > > > - Cyril > > > > > > > > > > > > > > > > > > > > > On Fri, Mar 29, 2013 at 12:01 PM, Chris Harrington < > > > > chris@heystaks.com > > > > > > >wrote: > > > > > > > > > > > > > >>> security.UserGroupInformation: > > > > > > >>> PriviledgedActionException as:cyril > > > > > > >> > > > > > > >> I'm not entirely sure but sounds like a permissions issue to > me. > > > > check > > > > > > all > > > > > > >> the files are owned by the user cyril and not root. > > > > > > >> also did you start hadoop as root and run the program as > cyril, > > > > hadoop > > > > > > >> might also complain about that > > > > > > >> > > > > > > >> On 29 Mar 2013, at 15:54, Cyril Bogus wrote: > > > > > > >> > > > > > > >>> Hi, > > > > > > >>> > > > > > > >>> I am running a small java program that basically write a > small > > > > input > > > > > > data > > > > > > >>> to the Hadoop FileSystem, run a Mahout Canopy and Kmeans > > > Clustering > > > > > and > > > > > > >>> then output the content of the data. > > > > > > >>> > > > > > > >>> In my hadoop.properties I have included the core-site.xml > > > > definition > > > > > > for > > > > > > >>> the Java program to connect to my single node setup so that I > > > will > > > > > not > > > > > > >> use > > > > > > >>> the Java Project file system but hadoop instead (Basically > all > > > > write > > > > > > and > > > > > > >>> read are done on hadoop and not in the class file). > > > > > > >>> > > > > > > >>> When I run the program, as soon as the Canopy (even the > > KMeans), > > > > > > >>> configuration tries to lookup for the file in the class path > > > > instead > > > > > of > > > > > > >> the > > > > > > >>> Hadoop FileSystem path where the proper files are located. > > > > > > >>> > > > > > > >>> Is there a problem with the way I have my conf defined? > > > > > > >>> > > > > > > >>> hadoop.properties: > > > > > > >>> fs.default.name=hdfs//mylocation > > > > > > >>> > > > > > > >>> Program: > > > > > > >>> > > > > > > >>> public class DataFileWriter { > > > > > > >>> > > > > > > >>> private static Properties props = new Properties(); > > > > > > >>> private static Configuration conf = new Configuration(); > > > > > > >>> > > > > > > >>> /** > > > > > > >>> * @param args > > > > > > >>> * @throws ClassNotFoundException > > > > > > >>> * @throws InterruptedException > > > > > > >>> * @throws IOException > > > > > > >>> */ > > > > > > >>> public static void main(String[] args) throws IOException, > > > > > > >>> InterruptedException, ClassNotFoundException { > > > > > > >>> > > > > > > >>> props.load(new FileReader(new File( > > > > > > >>> > > > > "/home/cyril/workspace/Newer/src/hadoop.properties"))); > > > > > > >>> > > > > > > >>> // TODO Auto-generated method stub > > > > > > >>> FileSystem fs = null; > > > > > > >>> SequenceFile.Writer writer; > > > > > > >>> SequenceFile.Reader reader; > > > > > > >>> > > > > > > >>> conf.set("fs.default.name", props.getProperty(" > > > > fs.default.name > > > > > > >> ")); > > > > > > >>> > > > > > > >>> List vectors = new > > LinkedList(); > > > > > > >>> NamedVector v1 = new NamedVector(new DenseVector(new > > > > double[] { > > > > > > >> 0.1, > > > > > > >>> 0.2, 0.5 }), "Hello"); > > > > > > >>> vectors.add(v1); > > > > > > >>> v1 = new NamedVector(new DenseVector(new double[] { > 0.5, > > > 0.1, > > > > > 0.2 > > > > > > >>> }), > > > > > > >>> "Bored"); > > > > > > >>> vectors.add(v1); > > > > > > >>> v1 = new NamedVector(new DenseVector(new double[] { > 0.2, > > > 0.5, > > > > > 0.1 > > > > > > >>> }), > > > > > > >>> "Done"); > > > > > > >>> vectors.add(v1); > > > > > > >>> // Write the data to SequenceFile > > > > > > >>> try { > > > > > > >>> fs = FileSystem.get(conf); > > > > > > >>> > > > > > > >>> Path path = new Path("testdata_seq/data"); > > > > > > >>> writer = new SequenceFile.Writer(fs, conf, path, > > > > > Text.class, > > > > > > >>> VectorWritable.class); > > > > > > >>> > > > > > > >>> VectorWritable vec = new VectorWritable(); > > > > > > >>> for (NamedVector vector : vectors) { > > > > > > >>> vec.set(vector); > > > > > > >>> writer.append(new Text(vector.getName()), vec); > > > > > > >>> } > > > > > > >>> writer.close(); > > > > > > >>> > > > > > > >>> } catch (Exception e) { > > > > > > >>> System.out.println("ERROR: " + e); > > > > > > >>> } > > > > > > >>> > > > > > > >>> Path input = new Path("testdata_seq/data"); > > > > > > >>> boolean runSequential = false; > > > > > > >>> Path clustersOut = new Path("testdata_seq/clusters"); > > > > > > >>> Path clustersIn = new > > > > > > >>> Path("testdata_seq/clusters/clusters-0-final"); > > > > > > >>> double convergenceDelta = 0; > > > > > > >>> double clusterClassificationThreshold = 0; > > > > > > >>> boolean runClustering = true; > > > > > > >>> Path output = new Path("testdata_seq/output"); > > > > > > >>> int maxIterations = 12; > > > > > > >>> CanopyDriver.run(conf, input, clustersOut, new > > > > > > >>> EuclideanDistanceMeasure(), 1, 1, 1, 1, 0, runClustering, > > > > > > >>> clusterClassificationThreshold, runSequential); > > > > > > >>> KMeansDriver.run(conf, input, clustersIn, output, new > > > > > > >>> EuclideanDistanceMeasure(), convergenceDelta, maxIterations, > > > > > > >> runClustering, > > > > > > >>> clusterClassificationThreshold, runSequential); > > > > > > >>> > > > > > > >>> reader = new SequenceFile.Reader(fs, > > > > > > >>> new > > > > Path("testdata_seq/clusteredPoints/part-m-00000"), > > > > > > >>> conf); > > > > > > >>> > > > > > > >>> IntWritable key = new IntWritable(); > > > > > > >>> WeightedVectorWritable value = new > > > WeightedVectorWritable(); > > > > > > >>> while (reader.next(key, value)) { > > > > > > >>> System.out.println(value.toString() + " belongs to > > > cluster > > > > " > > > > > > >>> + key.toString()); > > > > > > >>> } > > > > > > >>> } > > > > > > >>> > > > > > > >>> } > > > > > > >>> > > > > > > >>> Error Output: > > > > > > >>> > > > > > > >>> ....... > > > > > > >>> 13/03/29 11:47:15 ERROR security.UserGroupInformation: > > > > > > >>> PriviledgedActionException as:cyril > > > > > > >>> > > > cause:org.apache.hadoop.mapreduce.lib.input.InvalidInputException: > > > > > > Input > > > > > > >>> path does not exist: > > > > > file:/home/cyril/workspace/Newer/testdata_seq/data > > > > > > >>> Exception in thread "main" > > > > > > >>> org.apache.hadoop.mapreduce.lib.input.InvalidInputException: > > > Input > > > > > path > > > > > > >>> does not exist: > > > file:/home/cyril/workspace/Newer/testdata_seq/data > > > > > > >>> at > > > > > > >>> > > > > > > >> > > > > > > > > > > > > > > > > > > > > > org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:235) > > > > > > >>> at > > > > > > >>> > > > > > > >> > > > > > > > > > > > > > > > > > > > > > org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:55) > > > > > > >>> at > > > > > > >>> > > > > > > >> > > > > > > > > > > > > > > > > > > > > > org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:252) > > > > > > >>> at > > > > > > >> > > > > org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:962) > > > > > > >>> at > > > > > org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:979) > > > > > > >>> at > > > > > org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:174) > > > > > > >>> at > > org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:897) > > > > > > >>> at > > org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:850) > > > > > > >>> at java.security.AccessController.doPrivileged(Native > Method) > > > > > > >>> at javax.security.auth.Subject.doAs(Subject.java:416) > > > > > > >>> at > > > > > > >>> > > > > > > >> > > > > > > > > > > > > > > > > > > > > > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121) > > > > > > >>> at > > > > > > >>> > > > > > > > > > > > > org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:850) > > > > > > >>> at org.apache.hadoop.mapreduce.Job.submit(Job.java:500) > > > > > > >>> at > > > > org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:530) > > > > > > >>> at > > > > > > >>> > > > > > > >> > > > > > > > > > > > > > > > > > > > > > org.apache.mahout.clustering.classify.ClusterClassificationDriver.classifyClusterMR(ClusterClassificationDriver.java:275) > > > > > > >>> at > > > > > > >>> > > > > > > >> > > > > > > > > > > > > > > > > > > > > > org.apache.mahout.clustering.classify.ClusterClassificationDriver.run(ClusterClassificationDriver.java:135) > > > > > > >>> at > > > > > > >>> > > > > > > >> > > > > > > > > > > > > > > > > > > > > > org.apache.mahout.clustering.canopy.CanopyDriver.clusterData(CanopyDriver.java:372) > > > > > > >>> at > > > > > > >>> > > > > > > >> > > > > > > > > > > > > > > > > > > > > > org.apache.mahout.clustering.canopy.CanopyDriver.run(CanopyDriver.java:158) > > > > > > >>> at DataFileWriter.main(DataFileWriter.java:85) > > > > > > >>> > > > > > > >>> > > > > > > >>> > > > > > > >>> > > > > > > >>> On another note. Is there a command that would allow the > > program > > > to > > > > > > >>> overwrite existing files in the filesystem (I would get > errors > > > if I > > > > > > don't > > > > > > >>> delete the files before running the program again). > > > > > > >>> > > > > > > >>> Thank you for a reply and I hope I have given all the > necessary > > > > > output. > > > > > > >> In > > > > > > >>> the meantime I will look into it. > > > > > > >>> > > > > > > >>> Cyril > > > > > > >> > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > --001a11c37cdcb8fc3d04d91571cf--