systemml-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nakul Jindal <naku...@gmail.com>
Subject Re: GSoc 2017
Date Mon, 03 Apr 2017 08:37:57 GMT
Your project proposal looks great. Be sure to submit a final project proposal wherever it is
you need to. 

Thanks,
Nakul

> On Apr 2, 2017, at 4:08 PM, Krishna Kalyan <krishnakalyan3@gmail.com> wrote:
> 
> Hello All,
> I have updated the proposal. I hope this one is better. Please share your feedback.
> 
> https://docs.google.com/document/d/1DKWZTWvrvs73GYa1q3XEN5GFo8ALGjLH2DrIfRsJksA/edit#
> 
> FYI : Student Application Deadline April 3 16:00 UTC. 
> 
> 
> Regards,
> Krishna
> 
>> On Sun, Apr 2, 2017 at 2:39 PM, Krishna Kalyan <krishnakalyan3@gmail.com> wrote:
>> Hello Nakul,
>> My comments in Italics below.
>> 
>>> On Sat, Apr 1, 2017 at 11:27 PM, Nakul Jindal <nakul02@gmail.com> wrote:
>>> Hi Krishna,
>>> 
>>> Here are some questions/remarks i have about parts of your proposal:
>>> 
>>> In the section titled Summary -
>>> 
>>> "The systematic evaluation of performance can be measured with performance tests
and micro-benchmarks"
>>> We currently do not have any micro benchmarks. Do you plan on adding any? (It
would be awesome, but remember to keep the number of tasks reasonable given the time frame
and your familiarity with the project)
>> - Removed micro bench marks from the proposal. 
>>> 
>>> Your summary section feels like its generally applicable for performance testing
on any project, which is good. However, when it comes to talking about what you'd actually
be doing, I see - " build a benchmark infrastructure and conduct experiments, that compare
different choices in critical parts (sparsity thresholds, optimisation decisions, etc..)".
>> 
>> -  I agree and have made these changes.
>> 
>>> Going over each point:
>>> 
>>> 1. "build a benchmark infrastructure" - ok, i guess this subsumes pretty much
all the tasks involved 
>>> 2. "conduct experiments" - sure, although I think you mean testing your benchmarking
infrastructure, please correct me if this is not what you meant 
>>> 3. "that compare different choices in critical parts"
>>>   a. "sparsity thresholds" - awesome. You'd need to figure out what SystemML
already does and what to add. 
>>>   b. "optimization decisions" - could you provide an example or two of what exactly
you mean by this. Do you mean to enable and/or disable certain optimizations and run the perf
suite and also automate the process? or something else?
>>>   c. "etc" - more detail would be nice here. It would be nice to know what exactly
you are committing to.
>>> - will add more details in this section 
>>> 
>>> In the section titled Deliverables - 
>>> 
>>> You mention
>>> - "automation for all performance tests" - awesome! this is the primary task
>>> - "automatic scripts to test performance on a cloud provider" - this is great
>>> - "web dashboard" - awesome! this is a nice-to-have
>>> 
>>> But before the "cloud provider" and "web dashboard" task, we'd like to robustly
check for errors and record performance numbers and generate reports. (Tasks 2 - 6 on https://issues.apache.org/jira/browse/SYSTEMML-1451).
I see that you've mentioned some of these tasks in you "Project milestones" section as "Understand
metrics to be captured like time, memory, errors". It'd be good to put them here as well.
>> - Will add this information under Deliverables
>>> 
>>> Remember, you might also need to change the way SystemML reports errors and performance
numbers to complete your tasks. You, along with the currently active members of SystemML might
need to change the algorithms being tested as well.
>> 
>> - Sure will keep this in mind and will account for this in proposal. 
>>> 
>>> In the section titled "Project Milestones" - 
>>> Your project timeline looks good, the initial set of things to before May 30
and the fact that you've set aside the final week for buffer. You have dug down into a week
by week schedule, which is good. I have some suggestion though:
>>> 
>>> You need to 
>>> T1. Understand what is happening now, try it out for yourself
>> 
>> - Yes, I am following the documentation to simulate benchmarks on my local system.

>> 
>>> T2. You need to automate this process
>>> T3. You need to test that this automated process works as expected (and make
it robust)
>>> T4. You need to add additional capabilities (like micro-benchmarks and/or parameterizing
the tests and/or running it with sparse and dense sets)
>> 
>> - I will account for T3 and T4 more explicitly in my proposal.
>>  
>>> For each of the tasks that you mention in your deliverables, could you please
think about how you'd spend each week doing either T1-3 for a deliverable that is now being
done manually and T4 for one that is not being done at all right now?
>>> Please revisit some of the tasks on your timeline with this in mind.
>>> 
>>> I'd also ask that you set some deliverable(s) for phase 1 (due on June 26), phase
2 (due on July 26) and the final phase (ends on Aug 29).
>>> 
>>> A suggestion for the deliverables, if you wanted to be really ambitious and complete
every task possible :
>>> Phase 1 - implement infrastructure to launch perf suite and to detect errors
& report performance numbers in a plain text file
>>> Phase 2 - implement scripts to compare performance against older versions of
SystemML and other packages (Spark MLLib) and implement mechanism to generate report(s) with
errors and performance information in a spreadsheet or pdf or on a web interface
>>> Phase 3 - add additional perf tests for more algorithms, different sparsity thresholds
and optimization levels and include them in the reports. Also implement and test scripts to
run the perf suite on a cloud provider; doing this through a web UI.  
>>> 
>>> Something very conservative could be do 
>>> Phase 1 - automate perf suite and report perf numbers
>>> Phase 2 - make error reporting and handling robust, compare against previous
versions of systemml
>>> Phase 3 - add additional algorithms to the test suite, 
>> 
>> - I would prefer taking the conservative approach here.
>>> 
>>> These are just a suggestions, tweak it as you see fit.
>>> Having a deliverable attached to the end of a phase is a good thing. 
>>> 
>>> Hope I am not being too critical and hopefully this helps
>> 
>> - Not at all,  appreciate your feedback detailed reply. 
>> 
>> - Could you also let me know the co-mentors for this project?. I am working on the
proposal and will share an updated version soon.
>>  
>>> -Nakul
>>> 
>>> 
>>> 
>>> 
>>>> On Fri, Mar 31, 2017 at 5:13 PM, Krishna Kalyan <krishnakalyan3@gmail.com>
wrote:
>>>> Hello All,
>>>> Based on "SYSTEMML-1451" and  relevant SystemML source code, I have updated
the draft proposal. Please have a look and share your valuable feedback. 
>>>> 
>>>> https://docs.google.com/document/d/1DKWZTWvrvs73GYa1q3XEN5GFo8ALGjLH2DrIfRsJksA/edit?usp=sharing
>>>> 
>>>> Regards,
>>>> Krishna
>>>> 
>>>>> On Thu, Mar 30, 2017 at 8:20 PM, Krishna Kalyan <krishnakalyan3@gmail.com>
wrote:
>>>>> Hello All,
>>>>> I have created a proposal for 
>>>>> 
>>>>> d) Perftest : automated performance tests of algorithms
>>>>> (I am most comfortable with bash scripting and Python)
>>>>> 
>>>>> https://docs.google.com/document/d/1DKWZTWvrvs73GYa1q3XEN5GFo8ALGjLH2DrIfRsJksA/edit?usp=sharing
>>>>> 
>>>>> Please share your feedback on the proposal. If someone from the community
could mentor, it would be great.
>>>>> 
>>>>> Regards,
>>>>> Krishna
>>>>> 
>>>>>> On Mon, Mar 27, 2017 at 6:07 PM, Krishna Kalyan <krishnakalyan3@gmail.com>
wrote:
>>>>>> Thanks Nakul,
>>>>>> Replied to the JIRA thread.
>>>>>> 
>>>>>> Cheers,
>>>>>> Krishna
>>>>>> 
>>>>>>> On Mon, Mar 27, 2017 at 2:51 PM, Nakul Jindal <nakul02@gmail.com>
wrote:
>>>>>>> Hi Krishna,
>>>>>>> 
>>>>>>> We have 2 proposals up :
>>>>>>> https://issues.apache.org/jira/issues/?filter=12339687&jql=project%20%3D%20SYSTEMML%20AND%20labels%20%3D%20gsoc2017%20ORDER%20BY%20created%20DESC
>>>>>>> 
>>>>>>> Would you be interested in any of these?
>>>>>>> If you are specifically interested in the Python DSL project,
we can look for more volunteers or I could just volunteer to mentor it.
>>>>>>> 
>>>>>>> -Nakul
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>>> On Fri, Mar 24, 2017 at 12:05 PM, Nakul Jindal <nakul02@gmail.com>
wrote:
>>>>>>>> Hi Krishna, 
>>>>>>>> 
>>>>>>>> We are working on putting together some proposals. I created
is for a GPU based project.
>>>>>>>> https://issues.apache.org/jira/browse/SYSTEMML-1436
>>>>>>>> Be on the lookout for more.
>>>>>>>> 
>>>>>>>> Thanks,
>>>>>>>> Nakul
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> On Tue, Mar 21, 2017 at 10:01 AM, Krishna Kalyan <krishnakalyan3@gmail.com>
wrote:
>>>>>>>>> Hello Adina and Arvind thanks you for your reply,
>>>>>>>>> I am open to writing a proposal with a mentor and would
appreciate if we
>>>>>>>>> could take action quickly on this.
>>>>>>>>> 
>>>>>>>>> Best Regards,
>>>>>>>>> Krishna
>>>>>>>>> 
>>>>>>>>> On Sun, Mar 19, 2017 at 6:02 PM, Adina Crainiceanu <adina@usna.edu>
wrote:
>>>>>>>>> 
>>>>>>>>> > Apache Software Foundation applied and was accepted
for GSOC. I believe
>>>>>>>>> > SystemML could still participate as part of ASF
if interested (record your
>>>>>>>>> > ideas in JIRA and put gsoc2017 as label). See messages
on this subject on
>>>>>>>>> > the community.apache.org mailing list from Ulrich
Stark.
>>>>>>>>> > The following page also has useful info, even if
it is not updated for this
>>>>>>>>> > year: http://community.apache.org/gsoc.html - mentors
need to register
>>>>>>>>> > very
>>>>>>>>> > soon.
>>>>>>>>> >
>>>>>>>>> > Best regards,
>>>>>>>>> > Adina
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> > On Sun, Mar 19, 2017 at 3:51 PM, Arvind Surve <acs_s@yahoo.com.invalid>
>>>>>>>>> > wrote:
>>>>>>>>> >
>>>>>>>>> > > Thanks Krishna for your interest.
>>>>>>>>> > > Unfortunately we could not submit topic to
GSoc on time.However please
>>>>>>>>> > > feel free to leverage SystemML for your use
cases and do possible
>>>>>>>>> > > contribution to SystemML.
>>>>>>>>> > > Please let us know if you have any question.
>>>>>>>>> > >
>>>>>>>>> > > Arvind Surve | Spark Technology Center  | http://www.spark.tc/
>>>>>>>>> > >
>>>>>>>>> > >       From: Krishna Kalyan <krishnakalyan3@gmail.com>
>>>>>>>>> > >  To: dev@systemml.incubator.apache.org
>>>>>>>>> > >  Sent: Saturday, March 18, 2017 8:18 AM
>>>>>>>>> > >  Subject: Re: GSoc 2017
>>>>>>>>> > >
>>>>>>>>> > > Hello All,
>>>>>>>>> > > A Gentle ping. Student applications open in
a couple of days. I like to
>>>>>>>>> > > work on 'Support for Python DSLs'.
>>>>>>>>> > > However for now I am not sure on how to proceed.
>>>>>>>>> > >
>>>>>>>>> > > Thank you,
>>>>>>>>> > > Krishna
>>>>>>>>> > >
>>>>>>>>> > > On Thu, Jan 12, 2017 at 6:08 PM, <dusenberrymw@gmail.com>
wrote:
>>>>>>>>> > >
>>>>>>>>> > > > Yeah helping to build out our Python DSL
into a full-out replacement
>>>>>>>>> > for
>>>>>>>>> > > > the current "DML" language would be great,
and we'd be quite
>>>>>>>>> > supportive!
>>>>>>>>> > > >
>>>>>>>>> > > > -Mike
>>>>>>>>> > > >
>>>>>>>>> > > > --
>>>>>>>>> > > >
>>>>>>>>> > > > Mike Dusenberry
>>>>>>>>> > > > GitHub: github.com/dusenberrymw
>>>>>>>>> > > > LinkedIn: linkedin.com/in/mikedusenberry
>>>>>>>>> > > >
>>>>>>>>> > > > Sent from my iPhone.
>>>>>>>>> > > >
>>>>>>>>> > > >
>>>>>>>>> > > > > On Jan 12, 2017, at 2:58 PM, fschueler@posteo.de
wrote:
>>>>>>>>> > > > >
>>>>>>>>> > > > > Hi Krishna,
>>>>>>>>> > > > >
>>>>>>>>> > > > > cool to see that you're interested
in SystemML!
>>>>>>>>> > > > >
>>>>>>>>> > > > > From your list I personally think
that a) and d) would be well suited
>>>>>>>>> > > > for projects, especially a good python
DSL is a high priority.
>>>>>>>>> > > > >
>>>>>>>>> > > > > We will apply as an organization
to GSoC once organization
>>>>>>>>> > applications
>>>>>>>>> > > > are open (Jan. 19th) and I think we will
find mentors for at least a)
>>>>>>>>> > and
>>>>>>>>> > > > d). If you already want to take a look
at what is currently there, I
>>>>>>>>> > > > suggest to look at our python APIs and
documentation. If you want to
>>>>>>>>> > take
>>>>>>>>> > > > on the DSL project it might also be a
good idea to look into the DML
>>>>>>>>> > > > documentation and related papers to see
what we need to support.
>>>>>>>>> > > > >
>>>>>>>>> > > > > The proposals will probably circulate
on the mailinglist, too, so
>>>>>>>>> > keep
>>>>>>>>> > > > an eye on that :)
>>>>>>>>> > > > >
>>>>>>>>> > > > > -Felix
>>>>>>>>> > > > >
>>>>>>>>> > > > > Am 12.01.2017 23:13 schrieb Krishna
Kalyan:
>>>>>>>>> > > > >> Hello All,
>>>>>>>>> > > > >> Thank you for your wonderful
replies.
>>>>>>>>> > > > >> Tasks that I am interested in:
>>>>>>>>> > > > >> a) Support for Python DSLs
>>>>>>>>> > > > >> b) Python wrappers for all existing
algorithms
>>>>>>>>> > > > >> c) GPU support
>>>>>>>>> > > > >> d) Perftest : automated performance
tests of algorithms
>>>>>>>>> > > > >> I am also willing to work on
the tasks that SystemML community think
>>>>>>>>> > > are
>>>>>>>>> > > > >> important.
>>>>>>>>> > > > >> Regards,
>>>>>>>>> > > > >> Krishna
>>>>>>>>> > > > >> On Fri, Jan 6, 2017 at 10:14
PM, Mike Dusenberry <
>>>>>>>>> > > > dusenberrymw@gmail.com>
>>>>>>>>> > > > >> wrote:
>>>>>>>>> > > > >>> Hi Krishna!  Welcome, and
thanks for your interest!
>>>>>>>>> > > > >>> We would definitely be excited
to collaborate with you on a GSOC
>>>>>>>>> > > > project.
>>>>>>>>> > > > >>> We've started another thread
to discuss possible new proposals, and
>>>>>>>>> > > we
>>>>>>>>> > > > >>> would also be quite interested
in any particular proposal that you
>>>>>>>>> > > > might
>>>>>>>>> > > > >>> like to generate tailored
towards your interests.  Copied from the
>>>>>>>>> > > > other
>>>>>>>>> > > > >>> thread, some possible ideas
could include: building out a full ML
>>>>>>>>> > > demo
>>>>>>>>> > > > to
>>>>>>>>> > > > >>> solve a real, large-scale
problem that would benefit from a
>>>>>>>>> > > distributed
>>>>>>>>> > > > >>> approach; overall performance
improvements that address a full
>>>>>>>>> > class,
>>>>>>>>> > > > or
>>>>>>>>> > > > >>> wider area, of ML algorithms,
rather than a single, specific
>>>>>>>>> > script;
>>>>>>>>> > > > >>> infrastructure for [performance]
testing, and identification of
>>>>>>>>> > wide
>>>>>>>>> > > > areas
>>>>>>>>> > > > >>> of improvement; helping with
building out fully-featured, clean,
>>>>>>>>> > > > >>> well-tested DSLs in Python
& Scala (we've started, but it would be
>>>>>>>>> > > > good to
>>>>>>>>> > > > >>> continue stressing them --
we could even aim to replace DML with
>>>>>>>>> > the
>>>>>>>>> > > > DSLs);
>>>>>>>>> > > > >>> etc.  Overall, we want to
improve the ability of the user to work
>>>>>>>>> > on
>>>>>>>>> > > a
>>>>>>>>> > > > wide
>>>>>>>>> > > > >>> range of large-scale, distributed
ML problems in a simple and easy
>>>>>>>>> > > > manner
>>>>>>>>> > > > >>> on top of Spark.
>>>>>>>>> > > > >>> In the meantime, you could
explore our recent open issues [1] and
>>>>>>>>> > > even
>>>>>>>>> > > > >>> begin discussions or contributions
on any of the items.  You could
>>>>>>>>> > > also
>>>>>>>>> > > > >>> view our recent roadmap discussion
thread on the mailing list,
>>>>>>>>> > > starting
>>>>>>>>> > > > >>> with the first email [2]:
>>>>>>>>> > > > >>> [1]:
>>>>>>>>> > > > >>> https://issues.apache.org/jira/issues/?jql=project%20%3D%
>>>>>>>>> > > > 20SYSTEMML%20AND%
>>>>>>>>> > > > >>> 20resolution%20%3D%20Unresolved%20ORDER%20BY%20updated%20DESC%2C%
>>>>>>>>> > > > >>> 20priority%20DESC
>>>>>>>>> > > > >>> [2]:
>>>>>>>>> > > > >>> http://mail-archives.apache.org/mod_mbox/incubator-
>>>>>>>>> > > > >>> systemml-dev/201701.mbox/%3C9eb780f0-ff28-c702-117c-
>>>>>>>>> > > > >>> bad74059930d@gmail.com%3E
>>>>>>>>> > > > >>> - Mike
>>>>>>>>> > > > >>> --
>>>>>>>>> > > > >>> Michael W. Dusenberry
>>>>>>>>> > > > >>> GitHub: github.com/dusenberrymw
>>>>>>>>> > > > >>> LinkedIn: linkedin.com/in/mikedusenberry
>>>>>>>>> > > > >>> On Fri, Jan 6, 2017 at 12:34
PM, Luciano Resende <
>>>>>>>>> > > luckbr1975@gmail.com
>>>>>>>>> > > > >
>>>>>>>>> > > > >>> wrote:
>>>>>>>>> > > > >>> > As some folks have described
on this thread, it would be great to
>>>>>>>>> > > > get you
>>>>>>>>> > > > >>> > familiarized with SystemML.
>>>>>>>>> > > > >>> >
>>>>>>>>> > > > >>> > In parallel, I would
look for a mentor from the active committer
>>>>>>>>> > > > list and
>>>>>>>>> > > > >>> > start working on a project
proposal which could be based on the
>>>>>>>>> > > > recent
>>>>>>>>> > > > >>> > Roadmap discussion [1].
>>>>>>>>> > > > >>> >
>>>>>>>>> > > > >>> > If you are looking for
some guidance on how Apache participate on
>>>>>>>>> > > > GSOC,
>>>>>>>>> > > > >>> > take a look at the following
resources [2] and [3], and don't
>>>>>>>>> > > > hesitate to
>>>>>>>>> > > > >>> > ask questions here.
>>>>>>>>> > > > >>> >
>>>>>>>>> > > > >>> >
>>>>>>>>> > > > >>> > [1]
>>>>>>>>> > > > >>> > https://www.mail-archive.com/dev@systemml.incubator.apache.o
>>>>>>>>> > > > >>> > rg/msg01199.html
>>>>>>>>> > > > >>> > [2] http://community.apache.org/gsoc.html
>>>>>>>>> > > > >>> > [3]
>>>>>>>>> > > > >>> > http://www.slideshare.net/luckbr1975/how-mentoring-can-help-
>>>>>>>>> > > > >>> > you-start-contributing-to-open-source
>>>>>>>>> > > > >>> >
>>>>>>>>> > > > >>> > On Thu, Jan 5, 2017
at 3:15 PM, Krishna Kalyan <
>>>>>>>>> > > > krishnakalyan3@gmail.com
>>>>>>>>> > > > >>> >
>>>>>>>>> > > > >>> > wrote:
>>>>>>>>> > > > >>> >
>>>>>>>>> > > > >>> > > Hello Developers,
>>>>>>>>> > > > >>> > > I am Krishna, currently
a 2nd year Masters student in (MSc. in
>>>>>>>>> > > Data
>>>>>>>>> > > > >>> > Mining)
>>>>>>>>> > > > >>> > > currently in Barcelona
studying at Université Polytechnique de
>>>>>>>>> > > > >>> Catalogne.
>>>>>>>>> > > > >>> > > I was interested
in contributing to SystemML this year under
>>>>>>>>> > GSoc
>>>>>>>>> > > > >>> > program.
>>>>>>>>> > > > >>> > > Could anyone please
guide on how to go about it?. (I understand
>>>>>>>>> > > > the I
>>>>>>>>> > > > >>> > need
>>>>>>>>> > > > >>> > > to write a proposal)
>>>>>>>>> > > > >>> > >
>>>>>>>>> > > > >>> > > Related Experience:
>>>>>>>>> > > > >>> > > My masters is mostly
focussed on data mining techniques. Before
>>>>>>>>> > > my
>>>>>>>>> > > > >>> > masters,
>>>>>>>>> > > > >>> > > I was a  data engineer
with IBM (India). I was responsible for
>>>>>>>>> > > > managing
>>>>>>>>> > > > >>> > 50
>>>>>>>>> > > > >>> > > node Hadoop Cluster
for more than a year. Most of my time was
>>>>>>>>> > > spent
>>>>>>>>> > > > >>> > > optimising and
writing ETL (Apache Pig) jobs.
>>>>>>>>> > > > >>> > >
>>>>>>>>> > > > >>> > > I am the most comfortable
with Python followed by R and Scala.
>>>>>>>>> > > > >>> > >
>>>>>>>>> > > > >>> > > My Webpage
>>>>>>>>> > > > >>> > > kkalyan.in
>>>>>>>>> > > > >>> > >
>>>>>>>>> > > > >>> > > My Spark Pull Requests
>>>>>>>>> > > > >>> > > https://github.com/apache/spark/pulls?utf8=%E2%9C%93&q=
>>>>>>>>> > > > >>> is%3Apr%20author%
>>>>>>>>> > > > >>> > > 3Akrishnakalyan3%20
>>>>>>>>> > > > >>> > >
>>>>>>>>> > > > >>> > > Thank you so much,
>>>>>>>>> > > > >>> > > Krishna
>>>>>>>>> > > > >>> > >
>>>>>>>>> > > > >>> >
>>>>>>>>> > > > >>> >
>>>>>>>>> > > > >>> >
>>>>>>>>> > > > >>> > --
>>>>>>>>> > > > >>> > Luciano Resende
>>>>>>>>> > > > >>> > http://twitter.com/lresende1975
>>>>>>>>> > > > >>> > http://lresende.blogspot.com/
>>>>>>>>> > > > >>> >
>>>>>>>>> > > >
>>>>>>>>> > >
>>>>>>>>> > >
>>>>>>>>> > >
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> > --
>>>>>>>>> > Dr. Adina Crainiceanu
>>>>>>>>> > Associate Professor, Computer Science Department
>>>>>>>>> > United States Naval Academy
>>>>>>>>> > 410-293-6822
>>>>>>>>> > adina@usna.edu
>>>>>>>>> > http://www.usna.edu/Users/cs/adina/
>>>>>>>>> >
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> 
> 

Mime
  • Unnamed multipart/alternative (inline, 7-Bit, 0 bytes)
View raw message