mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Oleksandr Olgashko <alexandrolg...@gmail.com>
Subject Re: Implementing ICA
Date Tue, 07 Jan 2014 21:30:39 GMT
Didn't work with Spark before (just read their overview page).
Should i ask arising questions here or better switch to Spark's mailing
lists?


2014/1/7 Sebastian Schelter <ssc@apache.org>

> IIRC that papers talks about MapReduce on a shared-memory system, not on
> a shared-nothing system such as the Hadoop implementation.
>
> As a rule of thumb, iterations in Hadoop are about 10x slower than in
> systems such as Giraph, Spark or Stratosphere.
>
> --sebastian
>
> On 07.01.2014 22:01, Oleksandr Olgashko wrote:
> > What can you say about
> >
> http://www.cs.stanford.edu/people/ang//papers/nips06-mapreducemulticore.pdf
> ?
> >
> >
> > 2014/1/7 Dmitriy Lyubimov <dlieu.7@gmail.com>
> >
> >> yes. Create working notes how exactly to do that.  (Or, what i am a bit
> >> pushing you towards, Spark, since MR is not really iteration friendly
> >> platform and it looks like iterations are needed in fastICA.).
> >>
> >>
> >> On Tue, Jan 7, 2014 at 12:38 PM, Oleksandr Olgashko <
> >> alexandrolgash@gmail.com> wrote:
> >>
> >>> So the problem is to adapt ICA for MR, am i right?
> >>>
> >>>
> >>>
> >>> 2014/1/7 Dmitriy Lyubimov <dlieu.7@gmail.com>
> >>>
> >>>> i already looked at fast ICA. while it claims to be parallel, this
> work
> >>>> doesn't exactly map it into map reduce (or spark) paradigm and from
> >> what
> >>> i
> >>>> can recollect still implies outer iterations for fitting principal
> >>>> component vectors one by one. Which means it probably already is
> >>>> MR-unfriendly by construction; Spark may show far better promise here
> >> but
> >>>> still a working notes document is required to show how exactly. that's
> >>> what
> >>>> i mean.
> >>>>
> >>>>
> >>>> On Tue, Jan 7, 2014 at 1:35 AM, Oleksandr Olgashko <
> >>>> alexandrolgash@gmail.com
> >>>>> wrote:
> >>>>
> >>>>> Could you please take a look on this article?
> >>>>> http://cran.r-project.org/web/packages/fastICA/fastICA.pdf
> >>>>> I have learned that re-inventing the wheel is wrong for most
> >> problems,
> >>>> and
> >>>>> usually exists a better solution. However, it often needs some
> >>>> "grinding",
> >>>>> so I may research those ways, in case of approval.
> >>>>>
> >>>>> About Scala: unfortunately, I have never worked with this language
> >>>> before,
> >>>>> but wanted to. I'd like to fill that gap in my skills, but I don't
> >> know
> >>>>> exactly where to start.
> >>>>>
> >>>>>
> >>>>> 2014/1/7 Dmitriy Lyubimov <dlieu.7@gmail.com>
> >>>>>
> >>>>>> ICA is a very useful technique for dimensionality reduction.
I
> >>> believe
> >>>>>> Mahout would benefit from it; however challenges are fairly
> >>> significant
> >>>>> in
> >>>>>> terms of proven parallelization technique and acceptable efficacy,
> >>>> which
> >>>>>> makes it hard to just "implement" (I am not familiar at this
point
> >>> with
> >>>>> any
> >>>>>> concrete work on parallel ICA). So like i said before i am not
very
> >>>>>> hopeful. However, if one never tries, then nothing will get
ever
> >>> done.
> >>>>> who
> >>>>>> knows.
> >>>>>>
> >>>>>>
> >>>>>> On Mon, Jan 6, 2014 at 2:18 PM, Isabel Drost-Fromm <
> >>> isabel@apache.org
> >>>>>>> wrote:
> >>>>>>
> >>>>>>> On Mon, Jan 06, 2014 at 10:40:45PM +0200, Oleksandr Olgashko
> >> wrote:
> >>>>>>>> Returning back to question about theme to work, asked
2 months
> >>> ago.
> >>>>>>>> What algorithm should I implement?
> >>>>>>>
> >>>>>>> To be quite frank with you: None. Personally I'd rather
see
> >>>>> improvements
> >>>>>>> (in terms of documentation, integration, stableisation,
> >> performance
> >>>>>>> optimisation) of the existing Mahout source.
> >>>>>>>
> >>>>>>> Feel free to take a closer look at the thread concerning
"getting
> >>>>>>> involved" that we had around Christmas last year for inspiration.
> >>>>>>>
> >>>>>>>
> >>>>>>> Isabel
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> >
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message