hama-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tommaso Teofili <tommaso.teof...@gmail.com>
Subject Re: [DISCUSS] Things I'd like to focus on next
Date Sat, 06 Jun 2015 05:24:42 GMT
2015-06-05 17:49 GMT+02:00 Behroz Sikander <behroz89@gmail.com>:

> Hi,
> *>>Please feel free to contribute documentation to the Apache Hama
> wiki[1]!*
> Ok. I am new to open source world so quite new to the procedure. Whenever I
> will find something missing, I will edit it.
>
> *>>We also maybe work together on it but I have no idea yet. Custom
> “Modern” or*
> *“Classic” Style? Maven website again?*
> Ok. I do not quite understand what do you mean by Modern or Classic style.
> Does Apache provides some kind of CMS to manage the hosted project websites
> ?
>
> *>>ADDM is quite interesting, and it looks like more fit into BSP than
> MapReduce*
> *(even if HBase(?) or memory-based shared storage is used). *
> Yes ADMM seems to be a natural fit for BSP model because ADMM algorithms
> are iterative. In each iteration, different machines process and exchange
> data and the algorithm keep running unless a convergence criteria is met.
>
> Check out Chapter 10 (Page 78) of following ADMM paper:
> https://web.stanford.edu/~boyd/papers/pdf/admm_distr_stats.pdf
>
> It discusses the implementation details of ADMM on BigData systems.
>
> *>>But I don't fully understand *
> My understanding is also limited but if the cost function of ML algorithms
> is Convex then the cost function can be converted to ADMM form. Once in
> ADMM form we can run it on a distributed system like Hama.
>
> >>*and so don't know whether it can be used as abstraction layer of **many
> ML algorithms. We'll need more investigation.*
>
> Yes, more investigation is needed. Here are a few ML algorithms already in
> ADMM form (a,b,c).
>
> a) *L1 Linear Regression -*
> https://www.dtc.umn.edu/s/resources/tsp2010oct-dlasso.pdf
> b) *L2-Logistic Regression:*
>
> https://intentmedia.github.io/assets/2013-10-09-presenting-at-ieee-big-data/pld_js_ieee_bigdata_2013_admm.pdf
> c) *SVM* - http://www.jmlr.org/papers/volume11/forero10a/forero10a.pdf
>
>
I don't know ADMM myself but what you say sounds pretty much similar to how
we implemented gradient descent and linear / logistic regression [1] on top
of it.
Any improvement there would be of course highly appreciated, so feel free
to open Jira issues and attach patches accordingly.

Regards,
Tommaso

[1] :
https://github.com/apache/hama/tree/trunk/ml/src/main/java/org/apache/hama/ml/regression


>
> Regards,
> Behroz Sikander
>
>
>
>
> On Fri, Jun 5, 2015 at 3:19 AM, Edward J. Yoon <edward.yoon@samsung.com>
> wrote:
>
> > Please feel free to contribute documentation to the Apache Hama wiki[1]!
> > Ultimately, I'm considering improving our official website[2] on
> HAMA-960.
> > We
> > also maybe work together on it but I have no idea yet. Custom “Modern” or
> > “Classic” Style? Maven website again?
> >
> > ADDM is quite interesting, and it looks like more fit into BSP than
> > MapReduce
> > (even if HBase(?) or memory-based shared storage is used). But I don't
> > fully
> > understand and so don't know whether it can be used as abstraction layer
> of
> > many ML algorithms. We'll need more investigation.
> >
> >
> > 1. https://wiki.apache.org/hama
> > 2. https://hama.apache.org/
> >
> > --
> > Best Regards, Edward J. Yoon
> >
> > -----Original Message-----
> > From: Behroz Sikander [mailto:behroz89@gmail.com]
> > Sent: Thursday, June 04, 2015 10:24 PM
> > To: dev@hama.apache.org
> > Subject: Re: [DISCUSS] Things I'd like to focus on next
> >
> > Hi,
> > +1.
> > Yes documentation needs improvement. I also saw that a book on Hama is
> also
> > under progress. I can help with the documentation. I only found the
> > following open issuehttps://issues.apache.org/jira/browse/HAMA-960.
> >
> > Something like MLBase or Mahout on top of Hama would be really nice and
> > will boost the project. Regarding machine learning algorithms can we use
> > ADMM(a) to implement the algorithms ?
> > Like https://issues.apache.org/jira/browse/SPARK-1543
> >
> > a) https://web.stanford.edu/~boyd/papers/pdf/admm_distr_stats.pdf
> >
> > Regards,
> > Behroz Sikander
> >
> > On Wed, Jun 3, 2015 at 9:48 AM, Edward J. Yoon <edwardyoon@apache.org>
> > wrote:
> >
> > > Hey,
> > >
> > > Here's few things I'd like to focus on next.
> > >
> > > 1. Add stream input format for listening messages coming from 3rd
> > > party applications, and incremental learning algorithms.
> > > 2. Improve reliability of system e.g., fault tolerance, HA, ..., etc.
> > > 3. More machine learning algorithms, such as ensemble classifier, SVM,
> > > DNN, ..., etc
> > >
> > > Do you have any other suggestions?
> > >
> > > Thanks!
> > >
> > > --
> > > Best Regards, Edward J. Yoon
> > >
> >
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message