mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From satyam sinha <>
Subject Re: GSOC 2013 Aspirant on #MAHOUT-1177 and #MAHOUT-1179
Date Fri, 03 May 2013 00:32:42 GMT
Review Request :
I've submitted a very generalized proposal to ASF.
Is there some way I can confirm that it has been channeled and delivered to

The proposal is as following. Any advice is appreciated. (Perhaps i should
have provided a link instead ? )

*Short description:* Main goal of this project is to refactor for
performace/ease-of-use based on Mahout API design decided by community.
Additionally provide for info-graphic based documentation. Add/Redesign
:test , examples, benchmarks.

*Problem Description*

There is a need for restructuring the Mahout API to provide streamlined
input and output formats, and an intuitive structuring of the class and
project hierarchy. Several related projects may need a common interface
with regularized prototypes. We need to redesign several tests, benchmarks
and examples; and to add these in case they are not present.


   - Clean and optimized API
   - Documentation with info-graphics and dependency charts.
   - New tests and benchmarks

*Design Document*

The design of the new API is expected to come up well before the coding
phase starts based on ongoing discussions in mailing-lists. I will set up a
wiki that allows easy access and exchange of opinions on design.

I am a huge believer in info-graphics and will include the design graphic
and dependency graph on the Mahout documentation.


Largely IDE based development with help of integrated tools.Intent to
resort to CLI for writing and editing scripts.


The summer break is on, so I am essentially free till the mid of July. So,
I have a lot of time on my hands that I can devote to my project. I can
commit to over 40 hours every week. Regular classes resume thereafter
(which are no hindrance).

*Pre-Coding Phase:<3 weeks : 5 May - 26 May >*

Address few PMD, Find Bugs, Check Style, Open Tasks on Jenkins to gain
familiarity with the code-base and associated tools.Meanwhile, create the
re-factoring road-map based on open discussion in mahout community.

*Phase 1:**<3 weeks : 27 May - 16 June >*

Restructure the code-base to the new API design .Provide Regression testing
and redesign tests when required.

*Review** 1: **<1 week : 17 June - 23 June >*

Update the Mahout wiki and the Documentation .Provide and run
Diagnostics.Also document the tests and examples for the beginners.Profiler
report analysis, look for bottle-necks.

*Phase 2: **<2 weeks : 24 June - **7** July >*

Write tests and examples and benchmarks .Address community feedback on the
work in Phase 1.

*Review** **2**: **<1 week : **8** Ju**ly** - **14** Ju**ly** >*

End-to-end testing.Fix outstanding bugs.Report on performance improvement.

*Phase 3:<4 weeks : 15 July - 12 July >*

I hope to have built a very good foundation by now.Re-commence Integrated
development with concurrent testing and documentation.Resolve related JIRA

*Beyond GSOC:*

Remain associated with Mahout.Work towards becoming a commiter.

Due to the nature of the project; the timeline maybe subject to changes, to
reflect the variations in the roadmap.I am also open to tasks that my
mentor may see fit to assign me.


*About Me*

I am an under-graduate student about to start the final year of the
4-year-programme for Computer Science and Engineering, at Birla Institute
Of Technology, Mesra,India . I have a proper background in statistics ,
object-oriented programming, and system architecture.

I endeavor to build a career in scalable data science. I have developed a
preliminary understanding of Hadoop and Mahout API's and hope to build upon
the knowledge as we progress along GSOC.

This is my first experience with Open-source and I will surely give my

On Fri, May 3, 2013 at 5:59 AM, satyam sinha <> wrote:

> I lost a lot of time due to semester-evaluations at my institute.(I should
> have notified perhaps.)
> The summer break is begun and now I have uninterrupted time to devote to
> I have already setup hadoop-1.0.4 on opensuse-12.3.
> Mahout 0.8-SNAPSHOT via svn on netbeans-7.3
> I've been running various examples and tests included.
> It took me almost a week( Okay I'm not a wizard !! :) ) to setup and go
> through various talks and slides.
> I need some insight whether it is advisable to look into Avro now.
> I was away for college, but am back now full-time.
> (May this not reflect badly upon me.)
> Will setup the wiki with my initial ideas in under 24 hours, so that we
> can all discuss needs of the API.
> On Mon, Apr 8, 2013 at 12:23 PM, Isabel Drost-Fromm <>wrote:
>> Hi Satyam,
>> On Friday, April 05, 2013 09:54:56 PM satyam sinha wrote:
>> > Please give directions and suggestions to help me on my very first FOSS
>> > experience.
>> I guess the best way to get started is to check out the source code,
>> build the
>> project and get familiar with the code. Both issues you mention in the
>> subject
>> a good for people with less experience in machine learning and/or Hadoop.
>> However both are pretty involved - you will need to understand the
>> existing
>> code, come up with a good design for new APIs and discuss that design
>> with the
>> community. So best to concentrate on just one of them.
>> Feel free to also create a separate wiki page that contains a living
>> design
>> document for the APIs that others can contribute to as well.
>> Isabel

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message