mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Fernando Fernández <fernando.fernandez.gonza...@gmail.com>
Subject Re: RowSimilarity startphase and endphase parameters
Date Mon, 20 Dec 2010 11:41:47 GMT
But, does this affect the result? What will I get if I launch Rowsimiliarty
(cosine similarity) with --startphase=1 and --endPhase=2? I don't fully
understand what "phases" exactly are in this case.

2010/12/20 Niall Riddell <niall.riddell@xspca.com>

> Startphase and endphase shouldn't impact overall performance in any way,
> however it does mean that you can start at a later stage in a job pipeline.
>
> You can execute specific MR jobs by designating a startphase and endphase.
> It goes without saying that the correct inputs must be available to start a
> phase correctly.
>
> The first MR job is index 0.  So setting --startPhase 1 will execute the
> 2nd
> job onwards.  Putting in --endPhase 2 would stop after the 3rd job.
> On 20 Dec 2010 11:17, "Fernando Fernández" <
> fernando.fernandez.gonzalez@gmail.com> wrote:
> > Hello everyone,
> >
> > Can anyone explain what are exactly these two parameters (startphase and
> > endphase) and how to use them? I'm trying to launch a RowSimilarity job
> on
> a
> > 50K row matrix (100 columns) with cosine similarity and default
> startphase
> > and endphase parameters and I'm getting a extremely poor performance on a
> > quite big cluster (After 16 hours, only reached 3% of the proccess) and I
> > think that this could have something to do with startphase and endphase
> > parameters. What do you think? How do these paremeters affect the
> > RowSimilarity job?
> >
> > Thanks in advance.
> > Fernando.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message