mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From satyam sinha <sunnypic...@gmail.com>
Subject Re: GSOC 2013 Aspirant on #MAHOUT-1177 and #MAHOUT-1179
Date Fri, 03 May 2013 00:32:42 GMT
Review Request :
I've submitted a very generalized proposal to ASF.
Is there some way I can confirm that it has been channeled and delivered to
mahout?

The proposal is as following. Any advice is appreciated. (Perhaps i should
have provided a link instead ? )

*Short description:* Main goal of this project is to refactor for
performace/ease-of-use based on Mahout API design decided by community.
Additionally provide for info-graphic based documentation. Add/Redesign
:test , examples, benchmarks.

*Problem Description*

There is a need for restructuring the Mahout API to provide streamlined
input and output formats, and an intuitive structuring of the class and
project hierarchy. Several related projects may need a common interface
with regularized prototypes. We need to redesign several tests, benchmarks
and examples; and to add these in case they are not present.



*Deliverables*

   - Clean and optimized API
   - Documentation with info-graphics and dependency charts.
   - New tests and benchmarks



*Design Document*

The design of the new API is expected to come up well before the coding
phase starts based on ongoing discussions in mailing-lists. I will set up a
wiki that allows easy access and exchange of opinions on design.

I am a huge believer in info-graphics and will include the design graphic
and dependency graph on the Mahout documentation.



*Approach*

Largely IDE based development with help of integrated tools.Intent to
resort to CLI for writing and editing scripts.



*Timeline*

The summer break is on, so I am essentially free till the mid of July. So,
I have a lot of time on my hands that I can devote to my project. I can
commit to over 40 hours every week. Regular classes resume thereafter
(which are no hindrance).



*Pre-Coding Phase:<3 weeks : 5 May - 26 May >*

Address few PMD, Find Bugs, Check Style, Open Tasks on Jenkins to gain
familiarity with the code-base and associated tools.Meanwhile, create the
re-factoring road-map based on open discussion in mahout community.



*Phase 1:**<3 weeks : 27 May - 16 June >*

Restructure the code-base to the new API design .Provide Regression testing
and redesign tests when required.



*Review** 1: **<1 week : 17 June - 23 June >*

Update the Mahout wiki and the Documentation .Provide and run
Diagnostics.Also document the tests and examples for the beginners.Profiler
report analysis, look for bottle-necks.



*Phase 2: **<2 weeks : 24 June - **7** July >*

Write tests and examples and benchmarks .Address community feedback on the
work in Phase 1.



*Review** **2**: **<1 week : **8** Ju**ly** - **14** Ju**ly** >*

End-to-end testing.Fix outstanding bugs.Report on performance improvement.



*Phase 3:<4 weeks : 15 July - 12 July >*

I hope to have built a very good foundation by now.Re-commence Integrated
development with concurrent testing and documentation.Resolve related JIRA
issues.



*Beyond GSOC:*

Remain associated with Mahout.Work towards becoming a commiter.




Due to the nature of the project; the timeline maybe subject to changes, to
reflect the variations in the roadmap.I am also open to tasks that my
mentor may see fit to assign me.

*References:*
https://issues.apache.org/jira/browse/MAHOUT-1177
https://issues.apache.org/jira/browse/MAHOUT-1179



*About Me*

I am an under-graduate student about to start the final year of the
4-year-programme for Computer Science and Engineering, at Birla Institute
Of Technology, Mesra,India . I have a proper background in statistics ,
object-oriented programming, and system architecture.

I endeavor to build a career in scalable data science. I have developed a
preliminary understanding of Hadoop and Mahout API's and hope to build upon
the knowledge as we progress along GSOC.



This is my first experience with Open-source and I will surely give my
best.


On Fri, May 3, 2013 at 5:59 AM, satyam sinha <sunnypic143@gmail.com> wrote:

> I lost a lot of time due to semester-evaluations at my institute.(I should
> have notified perhaps.)
> The summer break is begun and now I have uninterrupted time to devote to
> GSOC.
>
> I have already setup hadoop-1.0.4 on opensuse-12.3.
> Mahout 0.8-SNAPSHOT via svn on netbeans-7.3
> I've been running various examples and tests included.
> It took me almost a week( Okay I'm not a wizard !! :) ) to setup and go
> through various talks and slides.
> I need some insight whether it is advisable to look into Avro now.
>
> TL;DR
> I was away for college, but am back now full-time.
> (May this not reflect badly upon me.)
> Will setup the wiki with my initial ideas in under 24 hours, so that we
> can all discuss needs of the API.
>
>
> On Mon, Apr 8, 2013 at 12:23 PM, Isabel Drost-Fromm <isabel@apache.org>wrote:
>
>>
>> Hi Satyam,
>>
>> On Friday, April 05, 2013 09:54:56 PM satyam sinha wrote:
>> > Please give directions and suggestions to help me on my very first FOSS
>> > experience.
>>
>> I guess the best way to get started is to check out the source code,
>> build the
>> project and get familiar with the code. Both issues you mention in the
>> subject
>> a good for people with less experience in machine learning and/or Hadoop.
>>
>> However both are pretty involved - you will need to understand the
>> existing
>> code, come up with a good design for new APIs and discuss that design
>> with the
>> community. So best to concentrate on just one of them.
>>
>> Feel free to also create a separate wiki page that contains a living
>> design
>> document for the APIs that others can contribute to as well.
>>
>>
>> Isabel
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message