systemml-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Krishna Kalyan <krishnakaly...@gmail.com>
Subject Re: GSoc 2017
Date Sun, 02 Apr 2017 18:39:50 GMT
Hello Nakul,
My comments in *Italics* below.

On Sat, Apr 1, 2017 at 11:27 PM, Nakul Jindal <nakul02@gmail.com> wrote:

> Hi Krishna,
>
> Here are some questions/remarks i have about parts of your proposal:
>
> In the section titled Summary -
>
> "The systematic evaluation of performance can be measured with
> performance tests and micro-benchmarks"
> We currently do not have any micro benchmarks. Do you plan on adding any?
> (It would be awesome, but remember to keep the number of tasks reasonable
> given the time frame and your familiarity with the project)
>
*- Removed micro bench marks from the proposal. *

>
> Your summary section feels like its generally applicable for performance
> testing on any project, which is good. However, when it comes to talking
> about what you'd actually be doing, I see - " build a benchmark
> infrastructure and conduct experiments, that compare different choices in
> critical parts (sparsity thresholds, optimisation decisions, etc..)".
>
*-  I agree and have made these changes.*

Going over each point:
>
> 1. "build a benchmark infrastructure" - ok, i guess this subsumes pretty
> much all the tasks involved
> 2. "conduct experiments" - sure, although I think you mean testing your
> benchmarking infrastructure, please correct me if this is not what you meant
>
>
3. "that compare different choices in critical parts"
> a. "sparsity thresholds" - awesome. You'd need to figure out what SystemML
> already does and what to add.
> b. "optimization decisions" - could you provide an example or two of what
> exactly you mean by this. Do you mean to enable and/or disable certain
> optimizations and run the perf suite and also automate the process? or
> something else?
> c. "etc" - more detail would be nice here. It would be nice to know what
> exactly you are committing to.
> *- will add more details in this section *
>
> In the section titled Deliverables -
>
> You mention
> - "automation for all performance tests" - awesome! this is the primary
> task
> - "automatic scripts to test performance on a cloud provider" - this is
> great
> - "web dashboard" - awesome! this is a nice-to-have
>
> But before the "cloud provider" and "web dashboard" task, we'd like to
> robustly check for errors and record performance numbers and generate
> reports. (Tasks 2 - 6 on https://issues.apache.org/j
> ira/browse/SYSTEMML-1451). I see that you've mentioned some of these
> tasks in you "Project milestones" section as "Understand metrics to be
> captured like time, memory, errors". It'd be good to put them here as well.
>
*- Will add this information under Deliverables*

>
> Remember, you might also need to change the way SystemML reports errors
> and performance numbers to complete your tasks. You, along with the
> currently active members of SystemML might need to change the algorithms
> being tested as well.
>
*- Sure will keep this in mind and will account for this in proposal. *

>
> In the section titled "Project Milestones" -
> Your project timeline looks good, the initial set of things to before May
> 30 and the fact that you've set aside the final week for buffer. You have
> dug down into a week by week schedule, which is good. I have some
> suggestion though:
>
> You need to
> T1. Understand what is happening now, try it out for yourself
>
*- Yes, I am following the documentation to simulate benchmarks on my local
system. *

T2. You need to automate this process
> T3. You need to test that this automated process works as expected (and
> make it robust)
> T4. You need to add additional capabilities (like micro-benchmarks and/or
> parameterizing the tests and/or running it with sparse and dense sets)
>
*- I will account for T3 and T4 more explicitly in my proposal.*


> For each of the tasks that you mention in your deliverables, could you
> please think about how you'd spend each week doing either T1-3 for a
> deliverable that is now being done manually and T4 for one that is not
> being done at all right now?
> Please revisit some of the tasks on your timeline with this in mind.
>
> I'd also ask that you set some deliverable(s) for phase 1 (due on June
> 26), phase 2 (due on July 26) and the final phase (ends on Aug 29).
>
> A suggestion for the deliverables, if you wanted to be really ambitious
> and complete every task possible :
> Phase 1 - implement infrastructure to launch perf suite and to detect
> errors & report performance numbers in a plain text file
> Phase 2 - implement scripts to compare performance against older versions
> of SystemML and other packages (Spark MLLib) and implement mechanism to
> generate report(s) with errors and performance information in a spreadsheet
> or pdf or on a web interface
> Phase 3 - add additional perf tests for more algorithms, different
> sparsity thresholds and optimization levels and include them in the
> reports. Also implement and test scripts to run the perf suite on a cloud
> provider; doing this through a web UI.
>
> Something very conservative could be do
> Phase 1 - automate perf suite and report perf numbers
> Phase 2 - make error reporting and handling robust, compare against
> previous versions of systemml
> Phase 3 - add additional algorithms to the test suite,
>
*- I would prefer taking the conservative approach here.*

>
> These are just a suggestions, tweak it as you see fit.
> Having a deliverable attached to the end of a phase is a good thing.
>
> Hope I am not being too critical and hopefully this helps
>
*- Not at all,  appreciate your feedback detailed reply. *

*- Could you also let me know the co-mentors for this project?. I am
working on the proposal and will share an updated version soon.*


> -Nakul
>
>
>
>
> On Fri, Mar 31, 2017 at 5:13 PM, Krishna Kalyan <krishnakalyan3@gmail.com>
> wrote:
>
>> Hello All,
>> Based on "SYSTEMML-1451" and  relevant SystemML source code, I have
>> updated the draft proposal. Please have a look and share your valuable
>> feedback.
>>
>> https://docs.google.com/document/d/1DKWZTWvrvs73GYa1q3XEN5GF
>> o8ALGjLH2DrIfRsJksA/edit?usp=sharing
>>
>> Regards,
>> Krishna
>>
>> On Thu, Mar 30, 2017 at 8:20 PM, Krishna Kalyan <krishnakalyan3@gmail.com
>> > wrote:
>>
>>> Hello All,
>>> I have created a proposal for
>>>
>>> d) Perftest : automated performance tests of algorithms
>>> (I am most comfortable with bash scripting and Python)
>>>
>>> https://docs.google.com/document/d/1DKWZTWvrvs73GYa1q3XEN5GF
>>> o8ALGjLH2DrIfRsJksA/edit?usp=sharing
>>>
>>> Please share your feedback on the proposal. If someone from the
>>> community could mentor, it would be great.
>>>
>>> Regards,
>>> Krishna
>>>
>>> On Mon, Mar 27, 2017 at 6:07 PM, Krishna Kalyan <
>>> krishnakalyan3@gmail.com> wrote:
>>>
>>>> Thanks Nakul,
>>>> Replied to the JIRA thread.
>>>>
>>>> Cheers,
>>>> Krishna
>>>>
>>>> On Mon, Mar 27, 2017 at 2:51 PM, Nakul Jindal <nakul02@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Krishna,
>>>>>
>>>>> We have 2 proposals up :
>>>>> https://issues.apache.org/jira/issues/?filter=12339687&jql=p
>>>>> roject%20%3D%20SYSTEMML%20AND%20labels%20%3D%20gsoc2017%20OR
>>>>> DER%20BY%20created%20DESC
>>>>>
>>>>> Would you be interested in any of these?
>>>>> If you are specifically interested in the Python DSL project, we can
>>>>> look for more volunteers or I could just volunteer to mentor it.
>>>>>
>>>>> -Nakul
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Mar 24, 2017 at 12:05 PM, Nakul Jindal <nakul02@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi Krishna,
>>>>>>
>>>>>> We are working on putting together some proposals. I created is for
a
>>>>>> GPU based project.
>>>>>> https://issues.apache.org/jira/browse/SYSTEMML-1436
>>>>>> Be on the lookout for more.
>>>>>>
>>>>>> Thanks,
>>>>>> Nakul
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, Mar 21, 2017 at 10:01 AM, Krishna Kalyan <
>>>>>> krishnakalyan3@gmail.com> wrote:
>>>>>>
>>>>>>> Hello Adina and Arvind thanks you for your reply,
>>>>>>> I am open to writing a proposal with a mentor and would appreciate
>>>>>>> if we
>>>>>>> could take action quickly on this.
>>>>>>>
>>>>>>> Best Regards,
>>>>>>> Krishna
>>>>>>>
>>>>>>> On Sun, Mar 19, 2017 at 6:02 PM, Adina Crainiceanu <adina@usna.edu>
>>>>>>> wrote:
>>>>>>>
>>>>>>> > Apache Software Foundation applied and was accepted for
GSOC. I
>>>>>>> believe
>>>>>>> > SystemML could still participate as part of ASF if interested
>>>>>>> (record your
>>>>>>> > ideas in JIRA and put gsoc2017 as label). See messages on
this
>>>>>>> subject on
>>>>>>> > the community.apache.org mailing list from Ulrich Stark.
>>>>>>> > The following page also has useful info, even if it is not
updated
>>>>>>> for this
>>>>>>> > year: http://community.apache.org/gsoc.html - mentors need
to
>>>>>>> register
>>>>>>> > very
>>>>>>> > soon.
>>>>>>> >
>>>>>>> > Best regards,
>>>>>>> > Adina
>>>>>>> >
>>>>>>> >
>>>>>>> > On Sun, Mar 19, 2017 at 3:51 PM, Arvind Surve
>>>>>>> <acs_s@yahoo.com.invalid>
>>>>>>> > wrote:
>>>>>>> >
>>>>>>> > > Thanks Krishna for your interest.
>>>>>>> > > Unfortunately we could not submit topic to GSoc on
time.However
>>>>>>> please
>>>>>>> > > feel free to leverage SystemML for your use cases and
do possible
>>>>>>> > > contribution to SystemML.
>>>>>>> > > Please let us know if you have any question.
>>>>>>> > >
>>>>>>> > > Arvind Surve | Spark Technology Center  | http://www.spark.tc/
>>>>>>> > >
>>>>>>> > >       From: Krishna Kalyan <krishnakalyan3@gmail.com>
>>>>>>> > >  To: dev@systemml.incubator.apache.org
>>>>>>> > >  Sent: Saturday, March 18, 2017 8:18 AM
>>>>>>> > >  Subject: Re: GSoc 2017
>>>>>>> > >
>>>>>>> > > Hello All,
>>>>>>> > > A Gentle ping. Student applications open in a couple
of days. I
>>>>>>> like to
>>>>>>> > > work on 'Support for Python DSLs'.
>>>>>>> > > However for now I am not sure on how to proceed.
>>>>>>> > >
>>>>>>> > > Thank you,
>>>>>>> > > Krishna
>>>>>>> > >
>>>>>>> > > On Thu, Jan 12, 2017 at 6:08 PM, <dusenberrymw@gmail.com>
wrote:
>>>>>>> > >
>>>>>>> > > > Yeah helping to build out our Python DSL into
a full-out
>>>>>>> replacement
>>>>>>> > for
>>>>>>> > > > the current "DML" language would be great, and
we'd be quite
>>>>>>> > supportive!
>>>>>>> > > >
>>>>>>> > > > -Mike
>>>>>>> > > >
>>>>>>> > > > --
>>>>>>> > > >
>>>>>>> > > > Mike Dusenberry
>>>>>>> > > > GitHub: github.com/dusenberrymw
>>>>>>> > > > LinkedIn: linkedin.com/in/mikedusenberry
>>>>>>> > > >
>>>>>>> > > > Sent from my iPhone.
>>>>>>> > > >
>>>>>>> > > >
>>>>>>> > > > > On Jan 12, 2017, at 2:58 PM, fschueler@posteo.de
wrote:
>>>>>>> > > > >
>>>>>>> > > > > Hi Krishna,
>>>>>>> > > > >
>>>>>>> > > > > cool to see that you're interested in SystemML!
>>>>>>> > > > >
>>>>>>> > > > > From your list I personally think that a)
and d) would be
>>>>>>> well suited
>>>>>>> > > > for projects, especially a good python DSL is
a high priority.
>>>>>>> > > > >
>>>>>>> > > > > We will apply as an organization to GSoC
once organization
>>>>>>> > applications
>>>>>>> > > > are open (Jan. 19th) and I think we will find
mentors for at
>>>>>>> least a)
>>>>>>> > and
>>>>>>> > > > d). If you already want to take a look at what
is currently
>>>>>>> there, I
>>>>>>> > > > suggest to look at our python APIs and documentation.
If you
>>>>>>> want to
>>>>>>> > take
>>>>>>> > > > on the DSL project it might also be a good idea
to look into
>>>>>>> the DML
>>>>>>> > > > documentation and related papers to see what we
need to
>>>>>>> support.
>>>>>>> > > > >
>>>>>>> > > > > The proposals will probably circulate on
the mailinglist,
>>>>>>> too, so
>>>>>>> > keep
>>>>>>> > > > an eye on that :)
>>>>>>> > > > >
>>>>>>> > > > > -Felix
>>>>>>> > > > >
>>>>>>> > > > > Am 12.01.2017 23:13 schrieb Krishna Kalyan:
>>>>>>> > > > >> Hello All,
>>>>>>> > > > >> Thank you for your wonderful replies.
>>>>>>> > > > >> Tasks that I am interested in:
>>>>>>> > > > >> a) Support for Python DSLs
>>>>>>> > > > >> b) Python wrappers for all existing algorithms
>>>>>>> > > > >> c) GPU support
>>>>>>> > > > >> d) Perftest : automated performance tests
of algorithms
>>>>>>> > > > >> I am also willing to work on the tasks
that SystemML
>>>>>>> community think
>>>>>>> > > are
>>>>>>> > > > >> important.
>>>>>>> > > > >> Regards,
>>>>>>> > > > >> Krishna
>>>>>>> > > > >> On Fri, Jan 6, 2017 at 10:14 PM, Mike
Dusenberry <
>>>>>>> > > > dusenberrymw@gmail.com>
>>>>>>> > > > >> wrote:
>>>>>>> > > > >>> Hi Krishna!  Welcome, and thanks
for your interest!
>>>>>>> > > > >>> We would definitely be excited to
collaborate with you on
>>>>>>> a GSOC
>>>>>>> > > > project.
>>>>>>> > > > >>> We've started another thread to discuss
possible new
>>>>>>> proposals, and
>>>>>>> > > we
>>>>>>> > > > >>> would also be quite interested in
any particular proposal
>>>>>>> that you
>>>>>>> > > > might
>>>>>>> > > > >>> like to generate tailored towards
your interests.  Copied
>>>>>>> from the
>>>>>>> > > > other
>>>>>>> > > > >>> thread, some possible ideas could
include: building out a
>>>>>>> full ML
>>>>>>> > > demo
>>>>>>> > > > to
>>>>>>> > > > >>> solve a real, large-scale problem
that would benefit from a
>>>>>>> > > distributed
>>>>>>> > > > >>> approach; overall performance improvements
that address a
>>>>>>> full
>>>>>>> > class,
>>>>>>> > > > or
>>>>>>> > > > >>> wider area, of ML algorithms, rather
than a single,
>>>>>>> specific
>>>>>>> > script;
>>>>>>> > > > >>> infrastructure for [performance]
testing, and
>>>>>>> identification of
>>>>>>> > wide
>>>>>>> > > > areas
>>>>>>> > > > >>> of improvement; helping with building
out fully-featured,
>>>>>>> clean,
>>>>>>> > > > >>> well-tested DSLs in Python &
Scala (we've started, but it
>>>>>>> would be
>>>>>>> > > > good to
>>>>>>> > > > >>> continue stressing them -- we could
even aim to replace
>>>>>>> DML with
>>>>>>> > the
>>>>>>> > > > DSLs);
>>>>>>> > > > >>> etc.  Overall, we want to improve
the ability of the user
>>>>>>> to work
>>>>>>> > on
>>>>>>> > > a
>>>>>>> > > > wide
>>>>>>> > > > >>> range of large-scale, distributed
ML problems in a simple
>>>>>>> and easy
>>>>>>> > > > manner
>>>>>>> > > > >>> on top of Spark.
>>>>>>> > > > >>> In the meantime, you could explore
our recent open issues
>>>>>>> [1] and
>>>>>>> > > even
>>>>>>> > > > >>> begin discussions or contributions
on any of the items.
>>>>>>> You could
>>>>>>> > > also
>>>>>>> > > > >>> view our recent roadmap discussion
thread on the mailing
>>>>>>> list,
>>>>>>> > > starting
>>>>>>> > > > >>> with the first email [2]:
>>>>>>> > > > >>> [1]:
>>>>>>> > > > >>> https://issues.apache.org/jira/issues/?jql=project%20%3D%
>>>>>>> > > > 20SYSTEMML%20AND%
>>>>>>> > > > >>> 20resolution%20%3D%20Unresolve
>>>>>>> d%20ORDER%20BY%20updated%20DESC%2C%
>>>>>>> > > > >>> 20priority%20DESC
>>>>>>> > > > >>> [2]:
>>>>>>> > > > >>> http://mail-archives.apache.org/mod_mbox/incubator-
>>>>>>> > > > >>> systemml-dev/201701.mbox/%3C9eb780f0-ff28-c702-117c-
>>>>>>> > > > >>> bad74059930d@gmail.com%3E
>>>>>>> > > > >>> - Mike
>>>>>>> > > > >>> --
>>>>>>> > > > >>> Michael W. Dusenberry
>>>>>>> > > > >>> GitHub: github.com/dusenberrymw
>>>>>>> > > > >>> LinkedIn: linkedin.com/in/mikedusenberry
>>>>>>> > > > >>> On Fri, Jan 6, 2017 at 12:34 PM,
Luciano Resende <
>>>>>>> > > luckbr1975@gmail.com
>>>>>>> > > > >
>>>>>>> > > > >>> wrote:
>>>>>>> > > > >>> > As some folks have described
on this thread, it would be
>>>>>>> great to
>>>>>>> > > > get you
>>>>>>> > > > >>> > familiarized with SystemML.
>>>>>>> > > > >>> >
>>>>>>> > > > >>> > In parallel, I would look for
a mentor from the active
>>>>>>> committer
>>>>>>> > > > list and
>>>>>>> > > > >>> > start working on a project proposal
which could be based
>>>>>>> on the
>>>>>>> > > > recent
>>>>>>> > > > >>> > Roadmap discussion [1].
>>>>>>> > > > >>> >
>>>>>>> > > > >>> > If you are looking for some
guidance on how Apache
>>>>>>> participate on
>>>>>>> > > > GSOC,
>>>>>>> > > > >>> > take a look at the following
resources [2] and [3], and
>>>>>>> don't
>>>>>>> > > > hesitate to
>>>>>>> > > > >>> > ask questions here.
>>>>>>> > > > >>> >
>>>>>>> > > > >>> >
>>>>>>> > > > >>> > [1]
>>>>>>> > > > >>> > https://www.mail-archive.com/d
>>>>>>> ev@systemml.incubator.apache.o
>>>>>>> > > > >>> > rg/msg01199.html
>>>>>>> > > > >>> > [2] http://community.apache.org/gsoc.html
>>>>>>> > > > >>> > [3]
>>>>>>> > > > >>> > http://www.slideshare.net/luck
>>>>>>> br1975/how-mentoring-can-help-
>>>>>>> > > > >>> > you-start-contributing-to-open-source
>>>>>>> > > > >>> >
>>>>>>> > > > >>> > On Thu, Jan 5, 2017 at 3:15
PM, Krishna Kalyan <
>>>>>>> > > > krishnakalyan3@gmail.com
>>>>>>> > > > >>> >
>>>>>>> > > > >>> > wrote:
>>>>>>> > > > >>> >
>>>>>>> > > > >>> > > Hello Developers,
>>>>>>> > > > >>> > > I am Krishna, currently
a 2nd year Masters student in
>>>>>>> (MSc. in
>>>>>>> > > Data
>>>>>>> > > > >>> > Mining)
>>>>>>> > > > >>> > > currently in Barcelona
studying at Université
>>>>>>> Polytechnique de
>>>>>>> > > > >>> Catalogne.
>>>>>>> > > > >>> > > I was interested in contributing
to SystemML this year
>>>>>>> under
>>>>>>> > GSoc
>>>>>>> > > > >>> > program.
>>>>>>> > > > >>> > > Could anyone please guide
on how to go about it?. (I
>>>>>>> understand
>>>>>>> > > > the I
>>>>>>> > > > >>> > need
>>>>>>> > > > >>> > > to write a proposal)
>>>>>>> > > > >>> > >
>>>>>>> > > > >>> > > Related Experience:
>>>>>>> > > > >>> > > My masters is mostly focussed
on data mining
>>>>>>> techniques. Before
>>>>>>> > > my
>>>>>>> > > > >>> > masters,
>>>>>>> > > > >>> > > I was a  data engineer
with IBM (India). I was
>>>>>>> responsible for
>>>>>>> > > > managing
>>>>>>> > > > >>> > 50
>>>>>>> > > > >>> > > node Hadoop Cluster for
more than a year. Most of my
>>>>>>> time was
>>>>>>> > > spent
>>>>>>> > > > >>> > > optimising and writing
ETL (Apache Pig) jobs.
>>>>>>> > > > >>> > >
>>>>>>> > > > >>> > > I am the most comfortable
with Python followed by R
>>>>>>> and Scala.
>>>>>>> > > > >>> > >
>>>>>>> > > > >>> > > My Webpage
>>>>>>> > > > >>> > > kkalyan.in
>>>>>>> > > > >>> > >
>>>>>>> > > > >>> > > My Spark Pull Requests
>>>>>>> > > > >>> > > https://github.com/apache/spar
>>>>>>> k/pulls?utf8=%E2%9C%93&q=
>>>>>>> > > > >>> is%3Apr%20author%
>>>>>>> > > > >>> > > 3Akrishnakalyan3%20
>>>>>>> > > > >>> > >
>>>>>>> > > > >>> > > Thank you so much,
>>>>>>> > > > >>> > > Krishna
>>>>>>> > > > >>> > >
>>>>>>> > > > >>> >
>>>>>>> > > > >>> >
>>>>>>> > > > >>> >
>>>>>>> > > > >>> > --
>>>>>>> > > > >>> > Luciano Resende
>>>>>>> > > > >>> > http://twitter.com/lresende1975
>>>>>>> > > > >>> > http://lresende.blogspot.com/
>>>>>>> > > > >>> >
>>>>>>> > > >
>>>>>>> > >
>>>>>>> > >
>>>>>>> > >
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>> > --
>>>>>>> > Dr. Adina Crainiceanu
>>>>>>> > Associate Professor, Computer Science Department
>>>>>>> > United States Naval Academy
>>>>>>> > 410-293-6822
>>>>>>> > adina@usna.edu
>>>>>>> > http://www.usna.edu/Users/cs/adina/
>>>>>>> >
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message