mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ying Liao <yliao...@gmail.com>
Subject Re: is Hadoop based SVD_ALS a complete feature?
Date Mon, 21 Jan 2013 22:59:47 GMT
The last problem is due to hadoop version conflict - Thanks Sebastian. I
updated POM with the hadoop version I am using and re-compiled and it's
gone.

Now a new problem I have is, I am working on the very sparse dataset - 60M
records from 3M users and 12M items. Running on the 9-machines cluster,
it takes 13 mins per iteration for 20 features, but it takes tens of hours
per iteration for 60 features. Is someone with me on this?

Thanks,
Ying




On Thu, Jan 17, 2013 at 11:09 AM, Pat Ferrel <pat.ferrel@gmail.com> wrote:

> There is a problem in factorize-movielens-1M.sh and the DataSplitter needs
> to initialize the args parser before it accesses the options ( I think I
> put a ticket in for the DataSplitter with a patch). The last problem below
> is Ying Liao's alone.
>
> On Jan 17, 2013, at 7:12 AM, Sebastian Schelter <ssc@apache.org> wrote:
>
> Which version/distribution of Hadoop are you using?
>
> On 17.01.2013 16:08, Pat Ferrel wrote:
> > +1 this, found the same problems, same fixes. Haven't seem your last
> problem
> >
> > On Jan 11, 2013, at 1:41 PM, Ying Liao <yliao422@gmail.com> wrote:
> >
> > I am tring factorize-movielens-1M.sh. I first find a bug in the sh file.
> > Then I find a bug in
> org.apache.mahout.cf.taste.hadoop.als.DatasetSplitter,
> > the argMap is not mapped. No I hit a third bug:
> > [cloudera@localhost trunk]$ hadoop jar
> >
> /home/cloudera/workspace/Mahout/trunk/examples/target/mahout-examples-0.8-SNAPSHOT-job.jar
> > org.apache.mahout.cf.taste.hadoop.als.DatasetSplitter --input
> > /user/cloudera/ratings.csv --output /user/cloudera/dataset
> > --trainingPercentage 0.9 --probePercentage 0.1 --tempDir
> > /user/cloudera/dataset/tmp
> > 13/01/11 16:37:30 INFO common.AbstractJob: Command line arguments:
> > {--endPhase=[2147483647], --input=[/user/cloudera/ratings.csv],
> > --output=[/user/cloudera/dataset], --probePercentage=[0.1],
> > --startPhase=[0], --tempDir=[/user/cloudera/dataset/tmp],
> > --trainingPercentage=[0.9]}
> > 13/01/11 16:37:30 WARN conf.Configuration: mapred.input.dir is
> deprecated.
> > Instead, use mapreduce.input.fileinputformat.inputdir
> > 13/01/11 16:37:30 WARN conf.Configuration: mapred.compress.map.output is
> > deprecated. Instead, use mapreduce.map.output.compress
> > 13/01/11 16:37:30 WARN conf.Configuration: mapred.output.dir is
> deprecated.
> > Instead, use mapreduce.output.fileoutputformat.outputdir
> > Exception in thread "main" java.lang.IncompatibleClassChangeError: Found
> > interface org.apache.hadoop.mapreduce.JobContext, but class was expected
> > at
> org.apache.mahout.common.HadoopUtil.getCustomJobName(HadoopUtil.java:166)
> > at org.apache.mahout.common.AbstractJob.prepareJob(AbstractJob.java:553)
> > at
> >
> org.apache.mahout.cf.taste.hadoop.als.DatasetSplitter.run(DatasetSplitter.java:85)
> > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
> > at
> >
> org.apache.mahout.cf.taste.hadoop.als.DatasetSplitter.main(DatasetSplitter.java:62)
> > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> > at
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> > at
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> > at java.lang.reflect.Method.invoke(Method.java:597)
> > at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
> >
> > Any help is appreciated.
> >
> > Thanks,
> > Ying
> >
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message