mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Liang Chenmin <liangchenmi...@gmail.com>
Subject Re: Job failed when running CanopyDriver on large dataset
Date Tue, 08 Dec 2009 22:12:09 GMT
Hi Grant,
    Thanks for your reply. I am using Mahout 0.2  I was also suspecting that
it was the problem of Hadoop version differences, but right now I see the
exception caused by "matrix.CardinalityException", I think that might be the
cause of the exception. I did not have that exception before. Looking into
the problem right now.

Thanks,
Chenmin

On Tue, Dec 8, 2009 at 12:52 PM, Grant Ingersoll <gsingers@apache.org>wrote:

> Is there an exception somewhere (check the Hadoop lower level logs)?  What
> version of Mahout are you on?  One of my concerns is that AEM is on Hadoop
> 0.18.3 (I think) and Mahout is on a later version.
>
> On Dec 8, 2009, at 1:45 PM, Liang Chenmin wrote:
>
> > Hi,
> >   I am running clustering using Mahout on Amazon Elastic Mapreduce. The
> > canopy clustering step failed at some point. I am using 8 instances of
> > m1.xlarge.  The machine that Amazon used for xlarge instance is
> configured
> > as follow:
> >
> >      Extra Large Instance 15 GB of memory, 8 EC2 Compute Units (4 virtual
> > cores with 2 EC2 Compute Units each), 1690 GB of    instance storage,
> 64-bit
> > platform
> >
> > The step that cause the erros is the Canopy clustering step:
> >    2009-12-08 09:52:03,057 INFO
> > org.apache.mahout.clustering.canopy.CanopyDriver (main): Input:
> > s3://mahout-output/xMDQYC
> > bDtc/data Out: s3://mahout-output/xMDQYCbDtc/canopies Measure:
> > org.apache.mahout.common.distance.EuclideanDistanceMeas
> > ure t1: 80.0 t2: 55.0 Vector Class: SparseVector
> >
> > And the last few lines of the syslog is as follow:
> >
> >   2009-12-08 09:52:03,196 WARN org.apache.hadoop.mapred.JobClient (main):
> > Use GenericOptionsParser for parsing the arguments. Applicat
> > ions should implement Tool for the same.
> > 2009-12-08 09:52:04,014 INFO org.apache.hadoop.mapred.FileInputFormat
> > (main): Total input paths to process : 105
> > 2009-12-08 09:52:04,222 INFO org.apache.hadoop.mapred.FileInputFormat
> > (main): Total input paths to process : 105
> > 2009-12-08 09:52:07,301 INFO org.apache.hadoop.mapred.JobClient (main):
> > Running job: job_200912080939_0002
> > 2009-12-08 09:52:08,304 INFO org.apache.hadoop.mapred.JobClient (main):
>  map
> > 0% reduce 0%
> > 2009-12-08 09:52:17,331 INFO org.apache.hadoop.mapred.JobClient (main):
>  map
> > 2% reduce 0%
> > 2009-12-08 09:52:18,335 INFO org.apache.hadoop.mapred.JobClient (main):
>  map
> > 5% reduce 0%
> > 2009-12-08 09:52:20,340 INFO org.apache.hadoop.mapred.JobClient (main):
>  map
> > 12% reduce 0%
> > 2009-12-08 09:52:21,343 INFO org.apache.hadoop.mapred.JobClient (main):
>  map
> > 13% reduce 0%
> > 2009-12-08 09:52:22,347 INFO org.apache.hadoop.mapred.JobClient (main):
>  map
> > 17% reduce 0%
> > 2009-12-08 09:52:24,363 INFO org.apache.hadoop.mapred.JobClient (main):
>  map
> > 18% reduce 0%
> > 2009-12-08 09:52:25,367 INFO org.apache.hadoop.mapred.JobClient (main):
>  map
> > 25% reduce 0%
> > 2009-12-08 09:52:26,371 INFO org.apache.hadoop.mapred.JobClient (main):
>  map
> > 28% reduce 0%
> > 2009-12-08 09:52:27,374 INFO org.apache.hadoop.mapred.JobClient (main):
>  map
> > 31% reduce 0%
> > 2009-12-08 09:52:28,377 INFO org.apache.hadoop.mapred.JobClient (main):
>  map
> > 35% reduce 0%
> > 2009-12-08 09:52:29,380 INFO org.apache.hadoop.mapred.JobClient (main):
>  map
> > 36% reduce 0%
> > 2009-12-08 09:52:30,383 INFO org.apache.hadoop.mapred.JobClient (main):
>  map
> > 39% reduce 0%
> > 2009-12-08 09:52:31,386 INFO org.apache.hadoop.mapred.JobClient (main):
>  map
> > 43% reduce 0%
> > 2009-12-08 09:52:32,388 INFO org.apache.hadoop.mapred.JobClient (main):
>  map
> > 45% reduce 0%
> > 2009-12-08 09:52:33,392 INFO org.apache.hadoop.mapred.JobClient (main):
>  map
> > 57% reduce 0%
> > 2009-12-08 09:52:34,395 INFO org.apache.hadoop.mapred.JobClient (main):
>  map
> > 62% reduce 0%
> > 2009-12-08 09:52:35,399 INFO org.apache.hadoop.mapred.JobClient (main):
>  map
> > 69% reduce 0%
> > 2009-12-08 09:52:36,409 INFO org.apache.hadoop.mapred.JobClient (main):
>  map
> > 79% reduce 0%
> > 2009-12-08 09:52:37,413 INFO org.apache.hadoop.mapred.JobClient (main):
>  map
> > 82% reduce 0%
> > 2009-12-08 09:52:38,417 INFO org.apache.hadoop.mapred.JobClient (main):
>  map
> > 90% reduce 0%
> > 2009-12-08 09:52:39,420 INFO org.apache.hadoop.mapred.JobClient (main):
>  map
> > 99% reduce 0%
> > 2009-12-08 09:52:42,432 INFO org.apache.hadoop.mapred.JobClient (main):
> Task
> > Id : attempt_200912080939_0002_m_000104_0, Status : FAI
> > LED
> > 2009-12-08 09:52:43,531 INFO org.apache.hadoop.mapred.JobClient (main):
>  map
> > 99% reduce 6%
> > 2009-12-08 09:52:48,544 INFO org.apache.hadoop.mapred.JobClient (main):
>  map
> > 99% reduce 13%
> > 2009-12-08 09:52:48,544 INFO org.apache.hadoop.mapred.JobClient (main):
> Task
> > Id : attempt_200912080939_0002_m_000104_1, Status : FAI
> > LED
> > 2009-12-08 09:52:53,564 INFO org.apache.hadoop.mapred.JobClient (main):
>  map
> > 99% reduce 15%
> > 2009-12-08 09:52:54,567 INFO org.apache.hadoop.mapred.JobClient (main):
> Task
> > Id : attempt_200912080939_0002_m_000104_2, Status : FAI
> > LED
> > 2009-12-08 09:52:58,605 INFO org.apache.hadoop.mapred.JobClient (main):
>  map
> > 99% reduce 22%
> >
> > I am a newbie to Hadoop and mahout, and I am seeking some help here.
> Seems
> > that some of the map reduce job fails. Is it because the file size is too
> > big? Or there are too many input paths?
> >
> > Thanks!
>
> --------------------------
> Grant Ingersoll
> http://www.lucidimagination.com/
>
> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using
> Solr/Lucene:
> http://www.lucidimagination.com/search
>
>


-- 
Chenmin Liang
Language Technologies Institute, School of Computer Science
Carnegie Mellon University

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message