mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ken Krugler <kkrugler_li...@transpac.com>
Subject Re: Using SparseVectorsFromSequenceFiles () in Java
Date Wed, 18 Sep 2013 14:02:38 GMT
Hi Darius,

On Sep 18, 2013, at 1:10am, Gokhan Capan wrote:

> It seems you hit a "Hadoop on Windows" issue, it might have something to do
> with how Hadoop sets file permissions.

From my experience, only the (old) 0.20.2 version of Hadoop works well with Cygwin, otherwise
you run into file permissions issues like the one you mentioned.

If you want to give that version a try, and can't find a download, see http://scaleunlimited.com/downloads/3nn2pq/hadoop-0.20.2.tgz

-- Ken


> On Tue, Sep 17, 2013 at 3:02 PM, Darius Miliauskas <
> dariui.miliauskui@gmail.com> wrote:
> 
>> That's like a charm, Gokhan, your suggestion was on point again. However...
>> Despite the fact that the build is successful, the file is still empty,
>> and I got the exception as always on Windows:
>> 
>> java.io.IOException: Failed to set permissions of path:
>> \tmp\hadoop-DARIUS\mapred\staging\DARIUS331150778\.staging to 0777
>> at org.apache.hadoop.fs.FileUtil.checkReturnValue(FileUtil.java:689)
>> at org.apache.hadoop.fs.FileUtil.setPermission(FileUtil.java:670)
>> at
>> 
>> org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:514)
>> at
>> org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:349)
>> at org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:189)
>> at
>> 
>> org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:116)
>> at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:918)
>> at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:912)
>> at java.security.AccessController.doPrivileged(Native Method)
>> at javax.security.auth.Subject.doAs(Subject.java:415)
>> at
>> 
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
>> at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:912)
>> at org.apache.hadoop.mapreduce.Job.submit(Job.java:500)
>> at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:530)
>> at
>> 
>> org.apache.mahout.vectorizer.DocumentProcessor.tokenizeDocuments(DocumentProcessor.java:93)
>> at
>> 
>> org.apache.mahout.vectorizer.SparseVectorsFromSequenceFiles.run(SparseVectorsFromSequenceFiles.java:257)
>> at
>> 
>> org.apache.mahout.mahoutnewsrecommender2.Recommender.myRecommender(Recommender.java:99)
>> at org.apache.mahout.mahoutnewsrecommender2.App.main(App.java:26)
>> 
>> BUILD SUCCESSFUL (total time: 3 seconds)
>> 
>> 
>> Thanks,
>> 
>> Darius
>> 
>> 
>> 
>> 
>> 2013/9/12 Gokhan Capan <gkhncpn@gmail.com>
>> 
>>> Although Windows is not officially supported, your
>>> svsf.run(new String[]{inputPath.toString(), outputPath.toString()})
>>> should be
>>> svsf.run(new String[]{"-i",inputPath.toString(), "-o",
>>> outputPath.toString()}) anyway.
>>> 
>>> Best
>>> 
>>> 
>>> Gokhan
>>> 
>>> 
>>> On Thu, Sep 12, 2013 at 4:14 PM, Darius Miliauskas <
>>> dariui.miliauskui@gmail.com> wrote:
>>> 
>>>> Dear All,
>>>> 
>>>> I am trying to use SparseVectorsFromSequenceFiles () through Java code
>>>> (NetBeans 7&Windows 7) . here is my code (API):
>>>> 
>>>> //inputPath is the path of my SequenceFile
>>>> Path inputPath = new Path(""C:\\Users\\DARIUS\\forTest1.txt");
>>>> 
>>>> //outputPath where I expect some results
>>>> Path outputPath = new Path("C:\\Users\\DARIUS\\forTest2.txt");
>>>> 
>>>> SparseVectorsFromSequenceFiles svfsf = new
>> SparseVectorsFromSequenceFiles
>>>> ();
>>>> svfsf.run(new String []{inputPath.toString(), outputPath.toString()
>>>> });
>>>> 
>>>> Build is successful. However, at the end I got just the empty file what
>>> was
>>>> expected to be my output. Do you have any idea why the output file is
>>>> empty, and what I should change in the code to get the results?
>>>> 
>>>> 
>>>> Ciao,
>>>> 
>>>> Darius
>>>> 
>>> 
>> 

--------------------------
Ken Krugler
+1 530-210-6378
http://www.scaleunlimited.com
custom big data solutions & training
Hadoop, Cascading, Cassandra & Solr






Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message