Return-Path: X-Original-To: apmail-mahout-dev-archive@www.apache.org Delivered-To: apmail-mahout-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 616BD19C12 for ; Thu, 28 Apr 2016 17:21:59 +0000 (UTC) Received: (qmail 57621 invoked by uid 500); 28 Apr 2016 17:21:58 -0000 Delivered-To: apmail-mahout-dev-archive@mahout.apache.org Received: (qmail 57550 invoked by uid 500); 28 Apr 2016 17:21:58 -0000 Mailing-List: contact dev-help@mahout.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@mahout.apache.org Delivered-To: mailing list dev@mahout.apache.org Received: (qmail 57537 invoked by uid 99); 28 Apr 2016 17:21:58 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 28 Apr 2016 17:21:58 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id C0863C7847 for ; Thu, 28 Apr 2016 17:21:57 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.079 X-Spam-Level: X-Spam-Status: No, score=0.079 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, KAM_ASCII_DIVIDERS=0.8, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=useitc-com.20150623.gappssmtp.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id PvKZv9m_mLBU for ; Thu, 28 Apr 2016 17:21:55 +0000 (UTC) Received: from mail-oi0-f49.google.com (mail-oi0-f49.google.com [209.85.218.49]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 6C13F5F241 for ; Thu, 28 Apr 2016 17:21:55 +0000 (UTC) Received: by mail-oi0-f49.google.com with SMTP id x201so91093694oif.3 for ; Thu, 28 Apr 2016 10:21:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=useitc-com.20150623.gappssmtp.com; s=20150623; h=mime-version:subject:from:in-reply-to:date :content-transfer-encoding:message-id:references:to; bh=duMTPkB5QhSpACiKBRAKagf2LwE6fLBWkNJCPLlhJAU=; b=mDtx2uWgXCRUUdWupchRJGwEwCpAt3NKSWHiU6UGMtJvm+S+++zzfb9XDmJ2IWGSQ/ SqlUeMxCBql7L4iZxAjlfrTOpZk/VwT+0VG4XJjzaHP66V9vIPUb1MEFmMHJlgE+DOaG nea+t47tz+i/v7jRbiq3LV/GMGeAHEk6i5RDY1Nl3b6h3pQUytUSyL+kNYucTtmAzRoY T5UEZcHw33LFM0vHFS9HaB5i+BcNeXhWBWmCkxJa1F/tMEU/4TTQu0F2STiE5NvEbjWw OgL/g2W+msDmLOxoNdQGTtmowDLA72zwVtecgsLdloH1hXAjd0JABxDp58/mS6v9ly3w gI0w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:subject:from:in-reply-to:date :content-transfer-encoding:message-id:references:to; bh=duMTPkB5QhSpACiKBRAKagf2LwE6fLBWkNJCPLlhJAU=; b=ZQWoi9sgs+UdiPR03933Fv8cN7IIfMWaTbZk++ivwH7CiYsw+hn7vzjfkEPBPtn98i lOV50k73ctJPbptTsplpVREpP5L6umzrAOG2RxTKtudIU7IXErNfnnU+LNYKUoSB6SKa xp7fpev0cMuR2i4h5Sruhbqnl3WBdwW/YD9ajbFeuVgNCvK/s31+QD/0lN8RN9+NgvsA GmJ/BOqZSCEixToMof2DvbWm5xyz95GktRJGvLEMlXK2pKOa0yH1lc4TstKD2tYOz3W+ olo1sWsNejxQMV+vc0MDZR3qiJWD1bYmPzxB/e/6LPuS1GBlOTNY/SWpzStWyFvWspYZ C3uA== X-Gm-Message-State: AOPr4FVAcANey2NT+pO9iaQfTB2gQAyzNYW9Pp5Ao8hNY8N8WROEwqRZH7v9JQABsIExO1udZ0IUljbkTzn82r3L5vKKp12skJ6l0V8pIX4aKFyEycmzw7qYEdroQe/A5HpHgFlrv4nMvoK0MPOSig4FFmSPKAehwLDShjp4pJf5z0C1GH879QrPI8q7Zg== X-Received: by 10.157.33.76 with SMTP id l12mr7288622otd.32.1461864108251; Thu, 28 Apr 2016 10:21:48 -0700 (PDT) Received: from [10.0.0.194] (wsip-72-194-194-45.dc.dc.cox.net. [72.194.194.45]) by smtp.gmail.com with ESMTPSA id h9sm3185530otb.17.2016.04.28.10.21.47 for (version=TLS1 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Thu, 28 Apr 2016 10:21:47 -0700 (PDT) Content-Type: text/plain; charset=iso-8859-1 Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2104\)) Subject: Re: Mahout contributions From: Khurrum Nasim In-Reply-To: Date: Thu, 28 Apr 2016 13:21:46 -0400 Content-Transfer-Encoding: quoted-printable Message-Id: <98999D7C-04B5-40B8-BDE7-5CD19FB5C098@useitc.com> References: To: dev@mahout.apache.org X-Mailer: Apple Mail (2.2104) @Saikat- why use EL instead of Lucene directly.=20 > On Apr 28, 2016, at 12:08 PM, Saikat Kanjilal = wrote: >=20 > This is great information thank you, based on this recommendation I = won't create a JIRA but start work on my project and when the code = approaches the percentages you are describing I will create the = appropriate JIRA's and put together a proposal to send to the list, = sound ok? Based on your latest updates to the wiki i will work on a = handful of the clustering algorithms since I see that the Spark = implementations for these are not yet complete. > Thank you again >=20 >> From: ap.dev@outlook.com >> To: dev@mahout.apache.org >> Subject: Re: Mahout contributions >> Date: Thu, 28 Apr 2016 01:31:09 +0000 >>=20 >> Saikat,=20 >>=20 >> One other thing that I should say is that you do not need clearance = or input from the committers to begin work on your project, and the = interest can and should come from the community as a whole. You can = write proposal as you've done, and if you don't see any "+1"s or = responses from the community at whole with in a few days, you may want = to explain in more detail, give examples and use cases. If you are = still not seeing +1s or any responses from others then I think you can = assume that there may not be interest; this is usually how things work. =20= >>=20 >> However if its something that your passionate about and you feel like = you can deliver this should not to stop you. People do not always read = the dev@ emails or have time to respond. You can still move forward = with your proposed contribution by following the steps laid out in my = previous email; follow the protocol at: >>=20 >> http://mahout.apache.org/developers/how-to-contribute.html >>=20 >> and create a JIRA. When you have reached a significant amount of = completion (around 70-80%), open a PR for review, this way you can = explain in more detail.=20 >>=20 >> But please realize that when you open a JIRA for a new issue there is = some expectation of a commitment on your part to complete it.=20 >>=20 >> For example, I am currently investigating some new plotting features. = I have spent a good deal of time this week and last already and am even = mocking up code as a sketch of what may become an implementation before = I open a "New Feature" JIRA for it. =20 >>=20 >> My point is absolutely not to discourage you or anybody else from = opening JIRAs for new features, rather to let you know that when you = open an JIRA for a new issue, It tells others that your are working on = it, and thus may discourage another with a similar idea to contribute = this feature. So it is best to open it once you've begun your work and = are committed to it. >>=20 >> Andy >>=20 >> ________________________________________ >> From: Saikat Kanjilal >> Sent: Wednesday, April 27, 2016 8:24 PM >> To: dev@mahout.apache.org >> Subject: RE: Mahout contributions >>=20 >> Andrew,Thank you very much for your input, I actually want to start a = new set of JIRAs, here's what I want to work on, I want to build a = framework that ties together search/visualization capability with some = machine learning algorithms, so essentially think of it as tying in = elasticsearch and kibana into mahout , the user can search for their = data with elasticsearch and for deeper analysis on that data they can = feed that data into one or more mahout backends for analysis. Another = interesting tie in might be to hack kibana to render ggplot like = graphics based on the output of mahout algorithms (assuming this can be = a kibana plugin). >> Before I go hog wild to create a bunch of JIRA's I'd like to know if = there's interest in this initiative. The tool will bring together the = ELK stack with dynamic machine learning algorithms. I can go into a lot = more detail around use cases if there's enough interest. >> Looking forward to your and other committers input.Thanks >>=20 >>> From: ap.dev@outlook.com >>> To: dev@mahout.apache.org >>> Subject: Re: Mahout contributions >>> Date: Wed, 27 Apr 2016 20:16:38 +0000 >>>=20 >>> Hello Saikat, >>>=20 >>> #1 and #2 above are already implemented. #4 is tricky so i would = not recommend without a strong knowledge of the codebase, and #5 is now = deprecated. (I've just updated the algorithms grid to reflect this). = The algorithms page includes both algorithms implemented in the = math-scala library and algorithms which have CLI drivers written for = them. >>>=20 >>> Please see: = http://mahout.apache.org/developers/how-to-contribute.html >>>=20 >>> And please note that per that documentation, it is in everybody's = best interest to keep messages on list, contacting committers directly = is discouraged. >>>=20 >>> The best way to contribute (if you have not found a new bug or = issue) would be for you to pick a single open issue in the mahout JIRA = which is not already assigned, and start work on it. When your work is = ready for review, just open up a PR and the committers will review it. = Please note that if you do pick up an issue to work on, we do expect = some amount of responsibility and reliability and tangible amount of = satisfactory work since once you've marked a JIRA as something you're = working on, others will pass on it. >>>=20 >>> Another good way to contribute would be to look for enhancements = that could make to existing code not necessarily open JIRAs that need to = be assigned to you. For example please see the recent contribution and = workflow on: https://issues.apache.org/jira/browse/MAHOUT-1833 . >>>=20 >>> If you have something new that you'd like to implement, simply start = a new JIRA issue and begin work on it. In this case, when you have some = code that is ready for review, you can simply open up a PR for it and = committers will review it. For new implementations, we generally say = that you should do this when you are at least 70-80% finished with your = coding. >>>=20 >>> Thank You, >>>=20 >>> Andy >>>=20 >>>=20 >>>=20 >>> ________________________________________ >>> From: Saikat Kanjilal >>> Sent: Tuesday, April 26, 2016 7:17 PM >>> To: dev@mahout.apache.org >>> Subject: RE: Mahout contributions >>>=20 >>> Hello,Following up on my last email with more specifics, I've = looked through the wiki = (https://mahout.apache.org/users/basics/algorithms.html) and I'm = interested in implementing the one or more of the following algorithms = with Mahout using spark: 1) Matrix Factorization with ALS 2) Naive Bayes = 3) Weighted Matrix Factorization, SVD++ 4) Sparse TF-IDF Vectors from = Text 5) Lucene integration. >>> Had a few questions:1) Which of these should I start with and where = is there the greatest need?2) Should I fork the repo and create branches = for the each of the above implementations?3) Should I go ahead and = create some JIRAs for these? >>> Would love to have some pointers to get started?Regards >>>=20 >>> From: sxk1969@hotmail.com >>> To: dev@mahout.apache.org >>> Subject: Mahout contributions >>> Date: Wed, 30 Mar 2016 10:23:45 -0700 >>>=20 >>>=20 >>>=20 >>>=20 >>> Hello Committers,I was looking through the current jira tickets and = was wondering if there's a particular area of Mahout that needs some = more help than others, should I focus on contributing some algorithms = usign DSL or Samsara related efforts, I've finally got some bandwidth to = do some work and would love some guidance before assigning myself some = tickets.Regards > =20