mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Khurrum Nasim <>
Subject Re: Mahout contributions
Date Thu, 28 Apr 2016 17:21:46 GMT
@Saikat- why use EL instead of Lucene directly. 

> On Apr 28, 2016, at 12:08 PM, Saikat Kanjilal <> wrote:
> This is great information thank you, based on this recommendation I won't create a JIRA
but start work on my project and when the code approaches the percentages you are describing
I will create the appropriate JIRA's and put together a proposal to send to the list, sound
ok?  Based on your latest updates to the wiki i will work on a handful of the clustering algorithms
since I see that the Spark implementations for these are not yet complete.
> Thank you again
>> From:
>> To:
>> Subject: Re: Mahout contributions
>> Date: Thu, 28 Apr 2016 01:31:09 +0000
>> Saikat, 
>> One other thing that I should say is that you do not need clearance or input from
the committers to begin work on your project, and the interest can and should come from the
community as a whole. You can write proposal as you've done, and if you don't see any "+1"s
or responses from the community at whole with in a few days, you may want to explain in more
detail, give examples and use cases.  If you are still not seeing +1s or any responses from
others then I think you can assume that there may not be interest; this is usually how things
>> However if its something that your passionate about and you feel like you can deliver
this should not to stop you.  People do not always read the dev@ emails or have time to respond.
 You can still move forward with your proposed contribution by following the steps laid out
in my previous email; follow the protocol at:
>> and create a JIRA.  When you have reached a significant amount of completion (around
70-80%), open a PR for review, this way you can explain in more detail. 
>> But please realize that when you open a JIRA for a new issue there is some expectation
of a commitment on your part to complete it. 
>> For example, I am currently investigating some new plotting features.  I have spent
a good deal of time this week and last already and am even mocking up code as a sketch of
what may become an implementation before I open a "New Feature" JIRA for it.    
>> My point is absolutely not to discourage you or anybody else from opening JIRAs for
new features, rather to let you know that when you open an JIRA for a new issue, It tells
others that your are working on it, and thus may discourage another with a similar idea to
contribute this feature.  So it is best to open it once you've begun your work and are committed
to it.
>> Andy
>> ________________________________________
>> From: Saikat Kanjilal <>
>> Sent: Wednesday, April 27, 2016 8:24 PM
>> To:
>> Subject: RE: Mahout contributions
>> Andrew,Thank you very much for your input, I actually want to start a new set of
JIRAs, here's what I want to work on, I want to build a framework that ties together search/visualization
capability with some machine learning algorithms, so essentially think of it as tying in elasticsearch
and kibana  into mahout , the user can search for their data with elasticsearch and for deeper
analysis on that data they can feed that data into one or more mahout backends for analysis.
 Another interesting tie in might be to hack kibana to render ggplot like graphics based on
the output of mahout algorithms (assuming this can be a kibana plugin).
>> Before I go hog wild to create a bunch of JIRA's I'd like to know if there's interest
in this initiative.  The tool will bring together the ELK stack with dynamic machine learning
algorithms.  I can go into a lot more detail around use cases if there's enough interest.
>> Looking forward to your and other committers input.Thanks
>>> From:
>>> To:
>>> Subject: Re: Mahout contributions
>>> Date: Wed, 27 Apr 2016 20:16:38 +0000
>>> Hello Saikat,
>>> #1 and #2 above are already implemented.  #4 is tricky so i would not recommend
without a strong knowledge of the codebase, and #5 is now deprecated.  (I've just updated
the algorithms grid to reflect this).  The algorithms page includes both algorithms implemented
in the math-scala library and algorithms which have CLI drivers written for them.
>>> Please see:
>>> And please note that per that documentation, it is in everybody's best interest
to keep messages on list, contacting committers directly is discouraged.
>>> The best way to contribute (if you have not found a new bug or issue) would be
for you to pick a single open issue in the mahout JIRA which is not already assigned, and
start work on it.  When your work is ready for review, just open up a PR and the committers
will review it.  Please note that if you do pick up an issue to work on, we do expect some
amount of responsibility and reliability and tangible amount of satisfactory work since once
you've marked a JIRA as something you're working on, others will pass on it.
>>> Another good way to contribute would be to look for enhancements that could make
to existing code not necessarily open JIRAs that need to be assigned to you.  For example
please see the recent contribution and workflow on:
>>> If you have something new that you'd like to implement, simply start a new JIRA
issue and begin work on it.  In this case, when you have some code that is ready for review,
 you can simply open up a PR for it and committers will review it.  For new implementations,
we generally say that you should do this when you are at least 70-80% finished with your coding.
>>> Thank You,
>>> Andy
>>> ________________________________________
>>> From: Saikat Kanjilal <>
>>> Sent: Tuesday, April 26, 2016 7:17 PM
>>> To:
>>> Subject: RE: Mahout contributions
>>> Hello,Following up on my last email with more specifics,  I've looked through
the wiki ( and I'm interested in implementing
the one or more of the following algorithms with Mahout using spark: 1) Matrix Factorization
with ALS 2) Naive Bayes 3) Weighted Matrix Factorization, SVD++ 4) Sparse TF-IDF Vectors from
Text 5) Lucene integration.
>>> Had a few questions:1) Which of these should I start with and where is there
the greatest need?2) Should I fork the repo and create branches for the each of the above
implementations?3) Should I go ahead and create some JIRAs for these?
>>> Would love to have some pointers to get started?Regards
>>> From:
>>> To:
>>> Subject: Mahout contributions
>>> Date: Wed, 30 Mar 2016 10:23:45 -0700
>>> Hello Committers,I was looking through the current jira tickets and was wondering
if there's a particular area of Mahout that needs some more help than others, should I focus
on contributing some algorithms usign DSL or Samsara related efforts, I've finally got some
bandwidth to do some work and would love some guidance before assigning myself some tickets.Regards

View raw message