systemml-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nakul Jindal <naku...@gmail.com>
Subject Re: GSoc 2017
Date Sun, 02 Apr 2017 03:27:01 GMT
Hi Krishna,

Here are some questions/remarks i have about parts of your proposal:

In the section titled Summary -

"The systematic evaluation of performance can be measured with performance
tests and micro-benchmarks"
We currently do not have any micro benchmarks. Do you plan on adding any?
(It would be awesome, but remember to keep the number of tasks reasonable
given the time frame and your familiarity with the project)

Your summary section feels like its generally applicable for performance
testing on any project, which is good. However, when it comes to talking
about what you'd actually be doing, I see - " build a benchmark
infrastructure and conduct experiments, that compare different choices in
critical parts (sparsity thresholds, optimisation decisions, etc..)".
Going over each point:

1. "build a benchmark infrastructure" - ok, i guess this subsumes pretty
much all the tasks involved
2. "conduct experiments" - sure, although I think you mean testing your
benchmarking infrastructure, please correct me if this is not what you meant
3. "that compare different choices in critical parts"
a. "sparsity thresholds" - awesome. You'd need to figure out what SystemML
already does and what to add.
b. "optimization decisions" - could you provide an example or two of what
exactly you mean by this. Do you mean to enable and/or disable certain
optimizations and run the perf suite and also automate the process? or
something else?
c. "etc" - more detail would be nice here. It would be nice to know what
exactly you are committing to.


In the section titled Deliverables -

You mention
- "automation for all performance tests" - awesome! this is the primary task
- "automatic scripts to test performance on a cloud provider" - this is
great
- "web dashboard" - awesome! this is a nice-to-have

But before the "cloud provider" and "web dashboard" task, we'd like to
robustly check for errors and record performance numbers and generate
reports. (Tasks 2 - 6 on https://issues.apache.org/jira/browse/SYSTEMML-1451).
I see that you've mentioned some of these tasks in you "Project milestones"
section as "Understand metrics to be captured like time, memory, errors".
It'd be good to put them here as well.

Remember, you might also need to change the way SystemML reports errors and
performance numbers to complete your tasks. You, along with the currently
active members of SystemML might need to change the algorithms being tested
as well.

In the section titled "Project Milestones" -
Your project timeline looks good, the initial set of things to before May
30 and the fact that you've set aside the final week for buffer. You have
dug down into a week by week schedule, which is good. I have some
suggestion though:

You need to
T1. Understand what is happening now, try it out for yourself
T2. You need to automate this process
T3. You need to test that this automated process works as expected (and
make it robust)
T4. You need to add additional capabilities (like micro-benchmarks and/or
parameterizing the tests and/or running it with sparse and dense sets)

For each of the tasks that you mention in your deliverables, could you
please think about how you'd spend each week doing either T1-3 for a
deliverable that is now being done manually and T4 for one that is not
being done at all right now?
Please revisit some of the tasks on your timeline with this in mind.

I'd also ask that you set some deliverable(s) for phase 1 (due on June 26),
phase 2 (due on July 26) and the final phase (ends on Aug 29).

A suggestion for the deliverables, if you wanted to be really ambitious and
complete every task possible :
Phase 1 - implement infrastructure to launch perf suite and to detect
errors & report performance numbers in a plain text file
Phase 2 - implement scripts to compare performance against older versions
of SystemML and other packages (Spark MLLib) and implement mechanism to
generate report(s) with errors and performance information in a spreadsheet
or pdf or on a web interface
Phase 3 - add additional perf tests for more algorithms, different sparsity
thresholds and optimization levels and include them in the reports. Also
implement and test scripts to run the perf suite on a cloud provider; doing
this through a web UI.

Something very conservative could be do
Phase 1 - automate perf suite and report perf numbers
Phase 2 - make error reporting and handling robust, compare against
previous versions of systemml
Phase 3 - add additional algorithms to the test suite,

These are just a suggestions, tweak it as you see fit.
Having a deliverable attached to the end of a phase is a good thing.

Hope I am not being too critical and hopefully this helps

-Nakul




On Fri, Mar 31, 2017 at 5:13 PM, Krishna Kalyan <krishnakalyan3@gmail.com>
wrote:

> Hello All,
> Based on "SYSTEMML-1451" and  relevant SystemML source code, I have
> updated the draft proposal. Please have a look and share your valuable
> feedback.
>
> https://docs.google.com/document/d/1DKWZTWvrvs73GYa1q3XEN5GF
> o8ALGjLH2DrIfRsJksA/edit?usp=sharing
>
> Regards,
> Krishna
>
> On Thu, Mar 30, 2017 at 8:20 PM, Krishna Kalyan <krishnakalyan3@gmail.com>
> wrote:
>
>> Hello All,
>> I have created a proposal for
>>
>> d) Perftest : automated performance tests of algorithms
>> (I am most comfortable with bash scripting and Python)
>>
>> https://docs.google.com/document/d/1DKWZTWvrvs73GYa1q3XEN5GF
>> o8ALGjLH2DrIfRsJksA/edit?usp=sharing
>>
>> Please share your feedback on the proposal. If someone from the community
>> could mentor, it would be great.
>>
>> Regards,
>> Krishna
>>
>> On Mon, Mar 27, 2017 at 6:07 PM, Krishna Kalyan <krishnakalyan3@gmail.com
>> > wrote:
>>
>>> Thanks Nakul,
>>> Replied to the JIRA thread.
>>>
>>> Cheers,
>>> Krishna
>>>
>>> On Mon, Mar 27, 2017 at 2:51 PM, Nakul Jindal <nakul02@gmail.com> wrote:
>>>
>>>> Hi Krishna,
>>>>
>>>> We have 2 proposals up :
>>>> https://issues.apache.org/jira/issues/?filter=12339687&jql=p
>>>> roject%20%3D%20SYSTEMML%20AND%20labels%20%3D%20gsoc2017%20OR
>>>> DER%20BY%20created%20DESC
>>>>
>>>> Would you be interested in any of these?
>>>> If you are specifically interested in the Python DSL project, we can
>>>> look for more volunteers or I could just volunteer to mentor it.
>>>>
>>>> -Nakul
>>>>
>>>>
>>>>
>>>> On Fri, Mar 24, 2017 at 12:05 PM, Nakul Jindal <nakul02@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Krishna,
>>>>>
>>>>> We are working on putting together some proposals. I created is for a
>>>>> GPU based project.
>>>>> https://issues.apache.org/jira/browse/SYSTEMML-1436
>>>>> Be on the lookout for more.
>>>>>
>>>>> Thanks,
>>>>> Nakul
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Mar 21, 2017 at 10:01 AM, Krishna Kalyan <
>>>>> krishnakalyan3@gmail.com> wrote:
>>>>>
>>>>>> Hello Adina and Arvind thanks you for your reply,
>>>>>> I am open to writing a proposal with a mentor and would appreciate
if
>>>>>> we
>>>>>> could take action quickly on this.
>>>>>>
>>>>>> Best Regards,
>>>>>> Krishna
>>>>>>
>>>>>> On Sun, Mar 19, 2017 at 6:02 PM, Adina Crainiceanu <adina@usna.edu>
>>>>>> wrote:
>>>>>>
>>>>>> > Apache Software Foundation applied and was accepted for GSOC.
I
>>>>>> believe
>>>>>> > SystemML could still participate as part of ASF if interested
>>>>>> (record your
>>>>>> > ideas in JIRA and put gsoc2017 as label). See messages on this
>>>>>> subject on
>>>>>> > the community.apache.org mailing list from Ulrich Stark.
>>>>>> > The following page also has useful info, even if it is not updated
>>>>>> for this
>>>>>> > year: http://community.apache.org/gsoc.html - mentors need to
>>>>>> register
>>>>>> > very
>>>>>> > soon.
>>>>>> >
>>>>>> > Best regards,
>>>>>> > Adina
>>>>>> >
>>>>>> >
>>>>>> > On Sun, Mar 19, 2017 at 3:51 PM, Arvind Surve
>>>>>> <acs_s@yahoo.com.invalid>
>>>>>> > wrote:
>>>>>> >
>>>>>> > > Thanks Krishna for your interest.
>>>>>> > > Unfortunately we could not submit topic to GSoc on time.However
>>>>>> please
>>>>>> > > feel free to leverage SystemML for your use cases and do
possible
>>>>>> > > contribution to SystemML.
>>>>>> > > Please let us know if you have any question.
>>>>>> > >
>>>>>> > > Arvind Surve | Spark Technology Center  | http://www.spark.tc/
>>>>>> > >
>>>>>> > >       From: Krishna Kalyan <krishnakalyan3@gmail.com>
>>>>>> > >  To: dev@systemml.incubator.apache.org
>>>>>> > >  Sent: Saturday, March 18, 2017 8:18 AM
>>>>>> > >  Subject: Re: GSoc 2017
>>>>>> > >
>>>>>> > > Hello All,
>>>>>> > > A Gentle ping. Student applications open in a couple of
days. I
>>>>>> like to
>>>>>> > > work on 'Support for Python DSLs'.
>>>>>> > > However for now I am not sure on how to proceed.
>>>>>> > >
>>>>>> > > Thank you,
>>>>>> > > Krishna
>>>>>> > >
>>>>>> > > On Thu, Jan 12, 2017 at 6:08 PM, <dusenberrymw@gmail.com>
wrote:
>>>>>> > >
>>>>>> > > > Yeah helping to build out our Python DSL into a full-out
>>>>>> replacement
>>>>>> > for
>>>>>> > > > the current "DML" language would be great, and we'd
be quite
>>>>>> > supportive!
>>>>>> > > >
>>>>>> > > > -Mike
>>>>>> > > >
>>>>>> > > > --
>>>>>> > > >
>>>>>> > > > Mike Dusenberry
>>>>>> > > > GitHub: github.com/dusenberrymw
>>>>>> > > > LinkedIn: linkedin.com/in/mikedusenberry
>>>>>> > > >
>>>>>> > > > Sent from my iPhone.
>>>>>> > > >
>>>>>> > > >
>>>>>> > > > > On Jan 12, 2017, at 2:58 PM, fschueler@posteo.de
wrote:
>>>>>> > > > >
>>>>>> > > > > Hi Krishna,
>>>>>> > > > >
>>>>>> > > > > cool to see that you're interested in SystemML!
>>>>>> > > > >
>>>>>> > > > > From your list I personally think that a) and
d) would be
>>>>>> well suited
>>>>>> > > > for projects, especially a good python DSL is a high
priority.
>>>>>> > > > >
>>>>>> > > > > We will apply as an organization to GSoC once
organization
>>>>>> > applications
>>>>>> > > > are open (Jan. 19th) and I think we will find mentors
for at
>>>>>> least a)
>>>>>> > and
>>>>>> > > > d). If you already want to take a look at what is
currently
>>>>>> there, I
>>>>>> > > > suggest to look at our python APIs and documentation.
If you
>>>>>> want to
>>>>>> > take
>>>>>> > > > on the DSL project it might also be a good idea to
look into
>>>>>> the DML
>>>>>> > > > documentation and related papers to see what we need
to support.
>>>>>> > > > >
>>>>>> > > > > The proposals will probably circulate on the
mailinglist,
>>>>>> too, so
>>>>>> > keep
>>>>>> > > > an eye on that :)
>>>>>> > > > >
>>>>>> > > > > -Felix
>>>>>> > > > >
>>>>>> > > > > Am 12.01.2017 23:13 schrieb Krishna Kalyan:
>>>>>> > > > >> Hello All,
>>>>>> > > > >> Thank you for your wonderful replies.
>>>>>> > > > >> Tasks that I am interested in:
>>>>>> > > > >> a) Support for Python DSLs
>>>>>> > > > >> b) Python wrappers for all existing algorithms
>>>>>> > > > >> c) GPU support
>>>>>> > > > >> d) Perftest : automated performance tests
of algorithms
>>>>>> > > > >> I am also willing to work on the tasks that
SystemML
>>>>>> community think
>>>>>> > > are
>>>>>> > > > >> important.
>>>>>> > > > >> Regards,
>>>>>> > > > >> Krishna
>>>>>> > > > >> On Fri, Jan 6, 2017 at 10:14 PM, Mike Dusenberry
<
>>>>>> > > > dusenberrymw@gmail.com>
>>>>>> > > > >> wrote:
>>>>>> > > > >>> Hi Krishna!  Welcome, and thanks for
your interest!
>>>>>> > > > >>> We would definitely be excited to collaborate
with you on a
>>>>>> GSOC
>>>>>> > > > project.
>>>>>> > > > >>> We've started another thread to discuss
possible new
>>>>>> proposals, and
>>>>>> > > we
>>>>>> > > > >>> would also be quite interested in any
particular proposal
>>>>>> that you
>>>>>> > > > might
>>>>>> > > > >>> like to generate tailored towards your
interests.  Copied
>>>>>> from the
>>>>>> > > > other
>>>>>> > > > >>> thread, some possible ideas could include:
building out a
>>>>>> full ML
>>>>>> > > demo
>>>>>> > > > to
>>>>>> > > > >>> solve a real, large-scale problem that
would benefit from a
>>>>>> > > distributed
>>>>>> > > > >>> approach; overall performance improvements
that address a
>>>>>> full
>>>>>> > class,
>>>>>> > > > or
>>>>>> > > > >>> wider area, of ML algorithms, rather
than a single, specific
>>>>>> > script;
>>>>>> > > > >>> infrastructure for [performance] testing,
and
>>>>>> identification of
>>>>>> > wide
>>>>>> > > > areas
>>>>>> > > > >>> of improvement; helping with building
out fully-featured,
>>>>>> clean,
>>>>>> > > > >>> well-tested DSLs in Python & Scala
(we've started, but it
>>>>>> would be
>>>>>> > > > good to
>>>>>> > > > >>> continue stressing them -- we could even
aim to replace DML
>>>>>> with
>>>>>> > the
>>>>>> > > > DSLs);
>>>>>> > > > >>> etc.  Overall, we want to improve the
ability of the user
>>>>>> to work
>>>>>> > on
>>>>>> > > a
>>>>>> > > > wide
>>>>>> > > > >>> range of large-scale, distributed ML
problems in a simple
>>>>>> and easy
>>>>>> > > > manner
>>>>>> > > > >>> on top of Spark.
>>>>>> > > > >>> In the meantime, you could explore our
recent open issues
>>>>>> [1] and
>>>>>> > > even
>>>>>> > > > >>> begin discussions or contributions on
any of the items.
>>>>>> You could
>>>>>> > > also
>>>>>> > > > >>> view our recent roadmap discussion thread
on the mailing
>>>>>> list,
>>>>>> > > starting
>>>>>> > > > >>> with the first email [2]:
>>>>>> > > > >>> [1]:
>>>>>> > > > >>> https://issues.apache.org/jira/issues/?jql=project%20%3D%
>>>>>> > > > 20SYSTEMML%20AND%
>>>>>> > > > >>> 20resolution%20%3D%20Unresolve
>>>>>> d%20ORDER%20BY%20updated%20DESC%2C%
>>>>>> > > > >>> 20priority%20DESC
>>>>>> > > > >>> [2]:
>>>>>> > > > >>> http://mail-archives.apache.org/mod_mbox/incubator-
>>>>>> > > > >>> systemml-dev/201701.mbox/%3C9eb780f0-ff28-c702-117c-
>>>>>> > > > >>> bad74059930d@gmail.com%3E
>>>>>> > > > >>> - Mike
>>>>>> > > > >>> --
>>>>>> > > > >>> Michael W. Dusenberry
>>>>>> > > > >>> GitHub: github.com/dusenberrymw
>>>>>> > > > >>> LinkedIn: linkedin.com/in/mikedusenberry
>>>>>> > > > >>> On Fri, Jan 6, 2017 at 12:34 PM, Luciano
Resende <
>>>>>> > > luckbr1975@gmail.com
>>>>>> > > > >
>>>>>> > > > >>> wrote:
>>>>>> > > > >>> > As some folks have described on
this thread, it would be
>>>>>> great to
>>>>>> > > > get you
>>>>>> > > > >>> > familiarized with SystemML.
>>>>>> > > > >>> >
>>>>>> > > > >>> > In parallel, I would look for a
mentor from the active
>>>>>> committer
>>>>>> > > > list and
>>>>>> > > > >>> > start working on a project proposal
which could be based
>>>>>> on the
>>>>>> > > > recent
>>>>>> > > > >>> > Roadmap discussion [1].
>>>>>> > > > >>> >
>>>>>> > > > >>> > If you are looking for some guidance
on how Apache
>>>>>> participate on
>>>>>> > > > GSOC,
>>>>>> > > > >>> > take a look at the following resources
[2] and [3], and
>>>>>> don't
>>>>>> > > > hesitate to
>>>>>> > > > >>> > ask questions here.
>>>>>> > > > >>> >
>>>>>> > > > >>> >
>>>>>> > > > >>> > [1]
>>>>>> > > > >>> > https://www.mail-archive.com/d
>>>>>> ev@systemml.incubator.apache.o
>>>>>> > > > >>> > rg/msg01199.html
>>>>>> > > > >>> > [2] http://community.apache.org/gsoc.html
>>>>>> > > > >>> > [3]
>>>>>> > > > >>> > http://www.slideshare.net/luck
>>>>>> br1975/how-mentoring-can-help-
>>>>>> > > > >>> > you-start-contributing-to-open-source
>>>>>> > > > >>> >
>>>>>> > > > >>> > On Thu, Jan 5, 2017 at 3:15 PM,
Krishna Kalyan <
>>>>>> > > > krishnakalyan3@gmail.com
>>>>>> > > > >>> >
>>>>>> > > > >>> > wrote:
>>>>>> > > > >>> >
>>>>>> > > > >>> > > Hello Developers,
>>>>>> > > > >>> > > I am Krishna, currently a 2nd
year Masters student in
>>>>>> (MSc. in
>>>>>> > > Data
>>>>>> > > > >>> > Mining)
>>>>>> > > > >>> > > currently in Barcelona studying
at Université
>>>>>> Polytechnique de
>>>>>> > > > >>> Catalogne.
>>>>>> > > > >>> > > I was interested in contributing
to SystemML this year
>>>>>> under
>>>>>> > GSoc
>>>>>> > > > >>> > program.
>>>>>> > > > >>> > > Could anyone please guide on
how to go about it?. (I
>>>>>> understand
>>>>>> > > > the I
>>>>>> > > > >>> > need
>>>>>> > > > >>> > > to write a proposal)
>>>>>> > > > >>> > >
>>>>>> > > > >>> > > Related Experience:
>>>>>> > > > >>> > > My masters is mostly focussed
on data mining
>>>>>> techniques. Before
>>>>>> > > my
>>>>>> > > > >>> > masters,
>>>>>> > > > >>> > > I was a  data engineer with
IBM (India). I was
>>>>>> responsible for
>>>>>> > > > managing
>>>>>> > > > >>> > 50
>>>>>> > > > >>> > > node Hadoop Cluster for more
than a year. Most of my
>>>>>> time was
>>>>>> > > spent
>>>>>> > > > >>> > > optimising and writing ETL
(Apache Pig) jobs.
>>>>>> > > > >>> > >
>>>>>> > > > >>> > > I am the most comfortable with
Python followed by R and
>>>>>> Scala.
>>>>>> > > > >>> > >
>>>>>> > > > >>> > > My Webpage
>>>>>> > > > >>> > > kkalyan.in
>>>>>> > > > >>> > >
>>>>>> > > > >>> > > My Spark Pull Requests
>>>>>> > > > >>> > > https://github.com/apache/spark/pulls?utf8=%E2%9C%93&q=
>>>>>> > > > >>> is%3Apr%20author%
>>>>>> > > > >>> > > 3Akrishnakalyan3%20
>>>>>> > > > >>> > >
>>>>>> > > > >>> > > Thank you so much,
>>>>>> > > > >>> > > Krishna
>>>>>> > > > >>> > >
>>>>>> > > > >>> >
>>>>>> > > > >>> >
>>>>>> > > > >>> >
>>>>>> > > > >>> > --
>>>>>> > > > >>> > Luciano Resende
>>>>>> > > > >>> > http://twitter.com/lresende1975
>>>>>> > > > >>> > http://lresende.blogspot.com/
>>>>>> > > > >>> >
>>>>>> > > >
>>>>>> > >
>>>>>> > >
>>>>>> > >
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> > --
>>>>>> > Dr. Adina Crainiceanu
>>>>>> > Associate Professor, Computer Science Department
>>>>>> > United States Naval Academy
>>>>>> > 410-293-6822
>>>>>> > adina@usna.edu
>>>>>> > http://www.usna.edu/Users/cs/adina/
>>>>>> >
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message