Mailing-List: contact dev-help@spark.apache.org; run by ezmlm
Precedence: bulk
MIME-Version: 1.0
References: <CAF7ADNrMmJHUAqjr6gosfN9Lk6COsBuQqGOe2sdekgd3_GjZqw@mail.gmail.com>
 <CA+=LZUrVsxCFxbdqxyFOGK24j2YRqpk0YpAVzEqccSHRgnS=qA@mail.gmail.com>
 <CANcNGZVc_nMm8Sf5D-HrT=aHVZ3EKN+gh2uo8ys=jMTSpOcWhw@mail.gmail.com>
 <BLUPR04MB77231AE1E2969CAE249A039917E0@BLUPR04MB772.namprd04.prod.outlook.com>
 <CAF7ADNq1J5G-+O_12qxhJWECwXqQaeZbibFsV4C115pBhBXfQw@mail.gmail.com>
 <CO2PR03MB23918B5CB2446778AB23E820BC750@CO2PR03MB2391.namprd03.prod.outlook.com>
 <CAMAsSdLWnZ_DjC59Rn7_=aB8wm=Ug-w-xU8JYQHNoRYEQxyvSA@mail.gmail.com>
 <CO2PR03MB23917E0E921A74C7CF33AF80BC740@CO2PR03MB2391.namprd03.prod.outlook.com>
 <CAMAsSdJyD-W1CEY__hkNt_154NRJDOXWZfyX28iQqBrh+pmp2Q@mail.gmail.com>
 <CAF7ADNryHnb2uxjYbLFKCN-HTnOg7H6Nu-wVs09kjgfHghjtow@mail.gmail.com>
 <CALD+6GPPE4m6Pyveo=30CMAQDbERcCGROh6itD_UFVAEO5DrpQ@mail.gmail.com> <CA+fQoq2fkNx8ZDAkq0ht1ajbAOg_uK_PVyWk4OYYGTsj7HsTjg@mail.gmail.com>
In-Reply-To: <CA+fQoq2fkNx8ZDAkq0ht1ajbAOg_uK_PVyWk4OYYGTsj7HsTjg@mail.gmail.com>
From: Nick Pentreath <nick.pentreath@gmail.com>
Date: Fri, 24 Feb 2017 08:28:16 +0000
Message-ID: <CALD+6GPcmBS+Ca3vhtTF7Z5ffW8CzekRdiH-538obtfcFe9cow@mail.gmail.com>
Subject: Re: Feedback on MLlib roadmap process proposal
To: "dev@spark.apache.org" <dev@spark.apache.org>
Content-Type: multipart/alternative; boundary=001a114a91ce9641d50549428384
archived-at: Fri, 24 Feb 2017 08:28:38 -0000

--001a114a91ce9641d50549428384
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

FYI I've started going through a few of the top Watched JIRAs and tried to
identify those that are obviously stale and can probably be closed, to try
to clean things up a bit.

On Thu, 23 Feb 2017 at 21:38 Tim Hunter <timhunter@databricks.com> wrote:

> As Sean wrote very nicely above, the changes made to Spark are decided in
> an organic fashion based on the interests and motivations of the committe=
rs
> and contributors. The case of deep learning is a good example. There is a
> lot of interest, and the core algorithms could be implemented without too
> much problem in a few thousands of lines of scala code. However, the
> performance of such a simple implementation would be one to two order of
> magnitude slower than what would get from the popular frameworks out ther=
e.
>
> At this point, there are probably more man-hours invested in TensorFlow
> (as an example) than in MLlib, so I think we need to be realistic about
> what we can expect to achieve inside Spark. Unlike BLAS for linear algebr=
a,
> there is no agreed-up interface for deep learning, and each of the XOnSpa=
rk
> flavors explores a slightly different design. It will be interesting to s=
ee
> what works well in practice. In the meantime, though, there are plenty of
> things that we could do to help developers of other libraries to have a
> great experience with Spark. Matei alluded to that in his Spark Summit
> keynote when he mentioned better integration with low-level libraries.
>
> Tim
>
>
> On Thu, Feb 23, 2017 at 5:32 AM, Nick Pentreath <nick.pentreath@gmail.com=
>
> wrote:
>
> Sorry for being late to the discussion. I think Joseph, Sean and others
> have covered the issues well.
>
> Overall I like the proposed cleaned up roadmap & process (thanks Joseph!)=
.
> As for the actual critical roadmap items mentioned on SPARK-18813, I thin=
k
> it makes sense and will comment a bit further on that JIRA.
>
> I would like to encourage votes & watching for issues to give a sense of
> what the community wants (I guess Vote is more explicit yet passive, whil=
e
> actually Watching an issue is more informative as it may indicate a real
> use case dependent on the issue?!).
>
> I think if used well this is valuable information for contributors. Of
> course not everything on that list can get done. But if I look through th=
e
> top votes or watch list, while not all of those are likely to go in, a
> great many of the issues are fairly non-contentious in terms of being goo=
d
> additions to the project.
>
> Things like these are good examples IMO (I just sample a few of them, not
> exhaustive):
> - sample weights for RF / DT
> - multi-model and/or parallel model selection
> - make sharedParams public?
> - multi-column support for various transformers
> - incremental model training
> - tree algorithm enhancements
>
> Now, whether these can be prioritised in terms of bandwidth available to
> reviewers and committers is a totally different thing. But as Sean mentio=
ns
> there is some process there for trying to find the balance of the issue
> being a "good thing to add", a shepherd with bandwidth & interest in the
> issue to review, and the maintenance burden imposed.
>
> Let's take Deep Learning / NN for example. Here's a good example of
> something that has a lot of votes/watchers and as Sean mentions it is
> something that "everyone wants someone else to implement". In this case,
> much of the interest may in fact be "stale" - 2 years ago it would have
> been very interesting to have a strong DL impl in Spark. Now, because the=
re
> are a plethora of very good DL libraries out there, how many of those Vot=
es
> would be "deleted"? Granted few are well integrated with Spark but that c=
an
> and is changing (DL4J, BigDL, the "XonSpark" flavours etc).
>
> So this is something that I dare say will not be in Spark any time in the
> foreseeable future or perhaps ever given the current status. Perhaps it's
> worth seriously thinking about just closing these kind of issues?
>
>
>
> On Fri, 27 Jan 2017 at 05:53 Joseph Bradley <joseph@databricks.com> wrote=
:
>
> Sean has given a great explanation.  A few more comments:
>
> Roadmap: I have been creating roadmap JIRAs, but the goal really is to
> have all committers working on MLlib help to set that roadmap, based on
> either their knowledge of current maintenance/internal needs of the proje=
ct
> or the feedback given from the rest of the community.
> @Committers - I see people actively shepherding PRs for MLlib, but I don'=
t
> see many major initiatives linked to the roadmap.  If there are ones larg=
e
> enough to merit adding to the roadmap, please do.
>
> In general, there are many process improvements we could make.  A few in
> my mind are:
> * Visibility: Let the community know what committers are focusing on.
> This was the primary purpose of the "MLlib roadmap proposal."
> * Community initiatives: This is currently very organic.  Some of the
> organic process could be improved, such as encouraging Votes/Watchers
> (though I agree with Sean about these being one-sided metrics).  Cody's S=
IP
> work is a great step towards adding more clarity and structure for major
> initiatives.
> * JIRA hygiene: Always a challenge, and always requires some manual
> prodding.  But it's great to push for efforts on this.
>
>
> On Wed, Jan 25, 2017 at 3:59 AM, Sean Owen <sowen@cloudera.com> wrote:
>
> On Wed, Jan 25, 2017 at 6:01 AM Ilya Matiach <ilmat@microsoft.com> wrote:
>
> My confusion was that the ML 2.2 roadmap critical features (
> https://issues.apache.org/jira/browse/SPARK-18813) did not line up with
> the top ML/MLLIB JIRAs by Votes
> <https://na01.safelinks.protection.outlook.com/?url=3Dhttps%3A%2F%2Fissue=
s.apache.org%2Fjira%2Fissues%2F%3Fjql%3Dproject%2520%253D%2520SPARK%2520AND=
%2520status%2520in%2520(Open%252C%2520%2522In%2520Progress%2522%252C%2520Re=
opened)%2520AND%2520component%2520in%2520(ML%252C%2520MLlib)%2520ORDER%2520=
BY%2520votes%2520DESC&data=3D02%7C01%7Cilmat%40microsoft.com%7C180d19608353=
4d9eee6b08d444754fae%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636208718=
015178106&sdata=3D%2FtFB0LY%2BIxLoEf%2FPr1i1%2FgvrjlpXPuYLSLbpnd89Tkg%3D&re=
served=3D0>or
> Watchers
> <https://na01.safelinks.protection.outlook.com/?url=3Dhttps%3A%2F%2Fissue=
s.apache.org%2Fjira%2Fissues%2F%3Fjql%3Dproject%2520%253D%2520SPARK%2520AND=
%2520status%2520in%2520(Open%252C%2520%2522In%2520Progress%2522%252C%2520Re=
opened)%2520AND%2520component%2520in%2520(ML%252C%2520MLlib)%2520ORDER%2520=
BY%2520Watchers%2520DESC&data=3D02%7C01%7Cilmat%40microsoft.com%7C180d19608=
3534d9eee6b08d444754fae%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636208=
718015178106&sdata=3DXkPfFiB2T%2FoVnJcdr3jf12dQjes7w%2BVJMrbhgx3ELRs%3D&res=
erved=3D0>
> .
>
> Your explanation that they do not have to and there is a more complex
> process to choosing the changes that will make it into the next release
> makes sense to me.
>
>
> For Spark ML, Joseph is the de facto leader and does publish a tentative
> roadmap. (We could also use JIRA mechanisms for this but any scheme is
> better than none.) Yes, not based on Votes -- nothing here is. Votes are
> noisy signal because it is usually measures: what would you like done if
> you didn't have to do it and there were no downsides for you?
>
>
>
> My only humble recommendation would be to cleanup the top JIRAs by closin=
g
> the ones which have spark packages for them (eg the NN one which already
> has several packages as you explained), noting or somehow marking on some
> that they will not be resolved, and changing the component on the ones no=
t
> related to ML/MLLIB (eg https://issues.apache.org/jira/browse/SPARK-12965
> ).
>
>
> We do that. It occasionally generates protests, so, I find myself erring
> on the side of ignoring. You can comment on any JIRA you think should be
> closed. That's helpful.
>
> That particular JIRA seems potentially legitimate. I wouldn't close it. I=
t
> also won't get fixed until someone proposes a resolution. I'd strongly
> encourage people saying "I have this problem too" to try to fix it. I ten=
d
> to ignore these otherwise, myself, in favor of reviewing ones where someo=
ne
> has gone to the trouble of proposing a working fix.
>
>
>
> Also, I would love to do this if I had the permissions, but it would be
> great to change the JIRAs that are marked as =E2=80=9Cin progress=E2=80=
=9D but where the
> corresponding pull request was closed/cancelled, for example
> https://issues.apache.org/jira/browse/SPARK-4638.  That JIRA is
>
>
> Yes, flag these. I or others can close them if appropriate. Anyone who
> consistently does this well, we could give JIRA permissions to.
>
> Opening a PR automatically makes it "In Progress" but there's no
> complementary process to un-mark it. You can ignore the Open / In Progres=
s
> distinction really.
>
> This one is interesting because it does seem like a plausible feature to
> add. The original PR was abandoned by the author and nobody else submitte=
d
> one -- despite the Votes. I hesitate to signal that no PRs would be
> considered, but, doesn't seem like it's in demand enough for someone to
> work on?
>
>
> I think one of my messages is that, de facto, here, like in many Apache
> projects, committers do not take requests. They pursue the work they
> believe needs doing, and shepherd work initiated by others (a clear bug
> report, a PR) to a resolution. Things get done by doing them, or by
> building influence by doing other things the project needs doing. It isn'=
t
> a mechanical, objective process, and can't be. But it does work in a
> recognizable way.
>
>
>
>
> --
>
> Joseph Bradley
>
> Software Engineer - Machine Learning
>
> Databricks, Inc.
>
> [image: http://databricks.com] <http://databricks.com/>
>
>
>

--001a114a91ce9641d50549428384
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">FYI I&#39;ve started going through a few of the top Watche=
d JIRAs and tried to identify those that are obviously stale and can probab=
ly be closed, to try to clean things up a bit.<div><br><div class=3D"gmail_=
quote"><div dir=3D"ltr">On Thu, 23 Feb 2017 at 21:38 Tim Hunter &lt;<a href=
=3D"mailto:timhunter@databricks.com">timhunter@databricks.com</a>&gt; wrote=
:<br></div><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;bor=
der-left:1px #ccc solid;padding-left:1ex"><div dir=3D"ltr" class=3D"gmail_m=
sg">As Sean wrote very nicely above, the changes made to Spark are decided =
in an organic fashion based on the interests and motivations of the committ=
ers and contributors. The case of deep learning is a good example. There is=
 a lot of interest, and the core algorithms could be implemented without to=
o much problem in a few thousands of lines of scala code. However, the perf=
ormance of such a simple implementation would be one to two order of magnit=
ude slower than what would get from the popular frameworks out there.<div c=
lass=3D"gmail_msg"><br class=3D"gmail_msg"></div><div class=3D"gmail_msg">A=
t this point, there are probably more man-hours invested in TensorFlow (as =
an example) than in MLlib, so I think we need to be realistic about what we=
 can expect to achieve inside Spark. Unlike BLAS for linear algebra, there =
is no agreed-up interface for deep learning, and each of the XOnSpark flavo=
rs explores a slightly different design. It will be interesting to see what=
 works well in practice. In the meantime, though, there are plenty of thing=
s that we could do to help developers of other libraries to have a great ex=
perience with Spark. Matei alluded to that in his Spark Summit keynote when=
 he mentioned better integration with low-level libraries.</div></div><div =
dir=3D"ltr" class=3D"gmail_msg"><div class=3D"gmail_msg"><br class=3D"gmail=
_msg"></div><div class=3D"gmail_msg">Tim</div><div class=3D"gmail_msg"><br =
class=3D"gmail_msg"></div></div><div class=3D"gmail_extra gmail_msg"><br cl=
ass=3D"gmail_msg"><div class=3D"gmail_quote gmail_msg">On Thu, Feb 23, 2017=
 at 5:32 AM, Nick Pentreath <span dir=3D"ltr" class=3D"gmail_msg">&lt;<a hr=
ef=3D"mailto:nick.pentreath@gmail.com" class=3D"gmail_msg" target=3D"_blank=
">nick.pentreath@gmail.com</a>&gt;</span> wrote:<br class=3D"gmail_msg"><bl=
ockquote class=3D"gmail_quote gmail_msg" style=3D"margin:0 0 0 .8ex;border-=
left:1px #ccc solid;padding-left:1ex"><div dir=3D"ltr" class=3D"gmail_msg">=
Sorry for being late to the discussion. I think Joseph, Sean and others hav=
e covered the issues well.=C2=A0<div class=3D"gmail_msg"><br class=3D"gmail=
_msg"></div><div class=3D"gmail_msg">Overall I like the proposed cleaned up=
 roadmap &amp; process (thanks Joseph!). As for the actual critical roadmap=
 items mentioned on=C2=A0SPARK-18813, I think it makes sense and will comme=
nt a bit further on that JIRA.</div><div class=3D"gmail_msg"><br class=3D"g=
mail_msg"></div><div class=3D"gmail_msg">I would like to encourage votes &a=
mp; watching for issues to give a sense of what the community wants (I gues=
s Vote is more explicit yet passive, while actually Watching an issue is mo=
re informative as it may indicate a real use case dependent on the issue?!)=
.</div><div class=3D"gmail_msg"><br class=3D"gmail_msg"></div><div class=3D=
"gmail_msg">I think if used well this is valuable information for contribut=
ors. Of course not everything on that list can get done. But if I look thro=
ugh the top votes or watch list, while not all of those are likely to go in=
, a great many of the issues are fairly non-contentious in terms of being g=
ood additions to the project.</div><div class=3D"gmail_msg"><br class=3D"gm=
ail_msg"></div><div class=3D"gmail_msg">Things like these are good examples=
 IMO (I just sample a few of them, not exhaustive):</div><div class=3D"gmai=
l_msg">- sample weights for RF / DT</div><div class=3D"gmail_msg">- multi-m=
odel and/or parallel model selection</div><div class=3D"gmail_msg">- make s=
haredParams public?</div><div class=3D"gmail_msg">- multi-column support fo=
r various transformers</div><div class=3D"gmail_msg">- incremental model tr=
aining</div><div class=3D"gmail_msg">- tree algorithm enhancements</div><di=
v class=3D"gmail_msg"><br class=3D"gmail_msg"></div><div class=3D"gmail_msg=
">Now, whether these can be prioritised in terms of bandwidth available to =
reviewers and committers is a totally different thing. But as Sean mentions=
 there is some process there for trying to find the balance of the issue be=
ing a &quot;good thing to add&quot;, a shepherd with bandwidth &amp; intere=
st in the issue to review, and the maintenance burden imposed.</div><div cl=
ass=3D"gmail_msg"><br class=3D"gmail_msg"></div><div class=3D"gmail_msg">Le=
t&#39;s take Deep Learning / NN for example. Here&#39;s a good example of s=
omething that has a lot of votes/watchers and as Sean mentions it is someth=
ing that &quot;everyone wants someone else to implement&quot;. In this case=
, much of the interest may in fact be &quot;stale&quot; - 2 years ago it wo=
uld have been very interesting to have a strong DL impl in Spark. Now, beca=
use there are a plethora of very good DL libraries out there, how many of t=
hose Votes would be &quot;deleted&quot;? Granted few are well integrated wi=
th Spark but that can and is changing (DL4J, BigDL, the &quot;XonSpark&quot=
; flavours etc).=C2=A0</div><div class=3D"gmail_msg"><br class=3D"gmail_msg=
"></div><div class=3D"gmail_msg">So this is something that I dare say will =
not be in Spark any time in the foreseeable future or perhaps ever given th=
e current status. Perhaps it&#39;s worth seriously thinking about just clos=
ing these kind of issues?</div><div class=3D"gmail_msg"><div class=3D"m_346=
6116339879802086h5 gmail_msg"><div class=3D"gmail_msg"><br class=3D"gmail_m=
sg"></div><div class=3D"gmail_msg"><br class=3D"gmail_msg"></div><div class=
=3D"gmail_msg"><br class=3D"gmail_msg"></div><div class=3D"gmail_msg"><div =
class=3D"gmail_quote gmail_msg"><div dir=3D"ltr" class=3D"gmail_msg">On Fri=
, 27 Jan 2017 at 05:53 Joseph Bradley &lt;<a href=3D"mailto:joseph@databric=
ks.com" class=3D"gmail_msg" target=3D"_blank">joseph@databricks.com</a>&gt;=
 wrote:<br class=3D"gmail_msg"></div><blockquote class=3D"gmail_quote gmail=
_msg" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1e=
x"><div dir=3D"ltr" class=3D"m_3466116339879802086m_1489654192441722201gmai=
l_msg gmail_msg">Sean has given a great explanation.=C2=A0 A few more comme=
nts:<div class=3D"m_3466116339879802086m_1489654192441722201gmail_msg gmail=
_msg"><br class=3D"m_3466116339879802086m_1489654192441722201gmail_msg gmai=
l_msg"></div><div class=3D"m_3466116339879802086m_1489654192441722201gmail_=
msg gmail_msg">Roadmap: I have been creating roadmap JIRAs, but the goal re=
ally is to have all committers working on MLlib help to set that roadmap, b=
ased on either their knowledge of current maintenance/internal needs of the=
 project or the feedback given from the rest of the community.</div><div cl=
ass=3D"m_3466116339879802086m_1489654192441722201gmail_msg gmail_msg">@Comm=
itters - I see people actively shepherding PRs for MLlib, but I don&#39;t s=
ee many major initiatives linked to the roadmap.=C2=A0 If there are ones la=
rge enough to merit adding to the roadmap, please do.</div><div class=3D"m_=
3466116339879802086m_1489654192441722201gmail_msg gmail_msg"><br class=3D"m=
_3466116339879802086m_1489654192441722201gmail_msg gmail_msg"></div><div cl=
ass=3D"m_3466116339879802086m_1489654192441722201gmail_msg gmail_msg">In ge=
neral, there are many process improvements we could make.=C2=A0 A few in my=
 mind are:</div><div class=3D"m_3466116339879802086m_1489654192441722201gma=
il_msg gmail_msg">* Visibility: Let the community know what committers are =
focusing on.=C2=A0 This was the primary purpose of the &quot;MLlib roadmap =
proposal.&quot;</div><div class=3D"m_3466116339879802086m_14896541924417222=
01gmail_msg gmail_msg">* Community initiatives: This is currently very orga=
nic.=C2=A0 Some of the organic process could be improved, such as encouragi=
ng Votes/Watchers (though I agree with Sean about these being one-sided met=
rics).=C2=A0 Cody&#39;s SIP work is a great step towards adding more clarit=
y and structure for major initiatives.</div><div class=3D"m_346611633987980=
2086m_1489654192441722201gmail_msg gmail_msg">* JIRA hygiene: Always a chal=
lenge, and always requires some manual prodding.=C2=A0 But it&#39;s great t=
o push for efforts on this.</div><div class=3D"m_3466116339879802086m_14896=
54192441722201gmail_msg gmail_msg"><br class=3D"m_3466116339879802086m_1489=
654192441722201gmail_msg gmail_msg"></div></div><div class=3D"gmail_extra m=
_3466116339879802086m_1489654192441722201gmail_msg gmail_msg"></div><div cl=
ass=3D"gmail_extra m_3466116339879802086m_1489654192441722201gmail_msg gmai=
l_msg"><br class=3D"m_3466116339879802086m_1489654192441722201gmail_msg gma=
il_msg"><div class=3D"gmail_quote m_3466116339879802086m_148965419244172220=
1gmail_msg gmail_msg">On Wed, Jan 25, 2017 at 3:59 AM, Sean Owen <span dir=
=3D"ltr" class=3D"m_3466116339879802086m_1489654192441722201gmail_msg gmail=
_msg">&lt;<a href=3D"mailto:sowen@cloudera.com" class=3D"m_3466116339879802=
086m_1489654192441722201gmail_msg gmail_msg" target=3D"_blank">sowen@cloude=
ra.com</a>&gt;</span> wrote:<br class=3D"m_3466116339879802086m_14896541924=
41722201gmail_msg gmail_msg"><blockquote class=3D"gmail_quote m_34661163398=
79802086m_1489654192441722201gmail_msg gmail_msg" style=3D"margin:0 0 0 .8e=
x;border-left:1px #ccc solid;padding-left:1ex"><div dir=3D"ltr" class=3D"m_=
3466116339879802086m_1489654192441722201gmail_msg gmail_msg"><div class=3D"=
gmail_quote m_3466116339879802086m_1489654192441722201gmail_msg gmail_msg">=
<span class=3D"m_3466116339879802086m_1489654192441722201gmail_msg gmail_ms=
g"><div dir=3D"ltr" class=3D"m_3466116339879802086m_1489654192441722201gmai=
l_msg gmail_msg">On Wed, Jan 25, 2017 at 6:01 AM Ilya Matiach &lt;<a href=
=3D"mailto:ilmat@microsoft.com" class=3D"m_3466116339879802086m_14896541924=
41722201gmail_msg gmail_msg" target=3D"_blank">ilmat@microsoft.com</a>&gt; =
wrote:</div><blockquote class=3D"gmail_quote m_3466116339879802086m_1489654=
192441722201gmail_msg gmail_msg" style=3D"margin:0 0 0 .8ex;border-left:1px=
 #ccc solid;padding-left:1ex"><div lang=3D"EN-US" link=3D"blue" vlink=3D"pu=
rple" class=3D"m_3466116339879802086m_1489654192441722201m_-945303160246519=
398m_7571632583529928415gmail_msg m_3466116339879802086m_148965419244172220=
1gmail_msg gmail_msg"><div class=3D"m_3466116339879802086m_1489654192441722=
201m_-945303160246519398m_7571632583529928415m_6498132320081888578WordSecti=
on1 m_3466116339879802086m_1489654192441722201m_-945303160246519398m_757163=
2583529928415gmail_msg m_3466116339879802086m_1489654192441722201gmail_msg =
gmail_msg">
<p class=3D"MsoNormal m_3466116339879802086m_1489654192441722201m_-94530316=
0246519398m_7571632583529928415gmail_msg m_3466116339879802086m_14896541924=
41722201gmail_msg gmail_msg">My confusion was that the ML 2.2 roadmap criti=
cal features (<a href=3D"https://issues.apache.org/jira/browse/SPARK-18813"=
 class=3D"m_3466116339879802086m_1489654192441722201m_-945303160246519398m_=
7571632583529928415gmail_msg m_3466116339879802086m_1489654192441722201gmai=
l_msg gmail_msg" target=3D"_blank">https://issues.apache.org/jira/browse/SP=
ARK-18813</a>) did not line up with the top ML/MLLIB JIRAs by
<span class=3D"m_3466116339879802086m_1489654192441722201m_-945303160246519=
398m_7571632583529928415m_6498132320081888578gmailmsg m_3466116339879802086=
m_1489654192441722201m_-945303160246519398m_7571632583529928415gmail_msg m_=
3466116339879802086m_1489654192441722201gmail_msg gmail_msg"><span lang=3D"=
EN" style=3D"font-size:10.5pt;font-family:&quot;Arial&quot;,sans-serif;colo=
r:#333333" class=3D"m_3466116339879802086m_1489654192441722201m_-9453031602=
46519398m_7571632583529928415gmail_msg m_3466116339879802086m_1489654192441=
722201gmail_msg gmail_msg"><a href=3D"https://na01.safelinks.protection.out=
look.com/?url=3Dhttps%3A%2F%2Fissues.apache.org%2Fjira%2Fissues%2F%3Fjql%3D=
project%2520%253D%2520SPARK%2520AND%2520status%2520in%2520(Open%252C%2520%2=
522In%2520Progress%2522%252C%2520Reopened)%2520AND%2520component%2520in%252=
0(ML%252C%2520MLlib)%2520ORDER%2520BY%2520votes%2520DESC&amp;data=3D02%7C01=
%7Cilmat%40microsoft.com%7C180d196083534d9eee6b08d444754fae%7C72f988bf86f14=
1af91ab2d7cd011db47%7C1%7C0%7C636208718015178106&amp;sdata=3D%2FtFB0LY%2BIx=
LoEf%2FPr1i1%2FgvrjlpXPuYLSLbpnd89Tkg%3D&amp;reserved=3D0" class=3D"m_34661=
16339879802086m_1489654192441722201m_-945303160246519398m_75716325835299284=
15gmail_msg m_3466116339879802086m_1489654192441722201gmail_msg gmail_msg" =
target=3D"_blank">Votes
</a>or <a href=3D"https://na01.safelinks.protection.outlook.com/?url=3Dhttp=
s%3A%2F%2Fissues.apache.org%2Fjira%2Fissues%2F%3Fjql%3Dproject%2520%253D%25=
20SPARK%2520AND%2520status%2520in%2520(Open%252C%2520%2522In%2520Progress%2=
522%252C%2520Reopened)%2520AND%2520component%2520in%2520(ML%252C%2520MLlib)=
%2520ORDER%2520BY%2520Watchers%2520DESC&amp;data=3D02%7C01%7Cilmat%40micros=
oft.com%7C180d196083534d9eee6b08d444754fae%7C72f988bf86f141af91ab2d7cd011db=
47%7C1%7C0%7C636208718015178106&amp;sdata=3DXkPfFiB2T%2FoVnJcdr3jf12dQjes7w=
%2BVJMrbhgx3ELRs%3D&amp;reserved=3D0" class=3D"m_3466116339879802086m_14896=
54192441722201m_-945303160246519398m_7571632583529928415gmail_msg m_3466116=
339879802086m_1489654192441722201gmail_msg gmail_msg" target=3D"_blank">
Watchers</a></span></span>.<u class=3D"m_3466116339879802086m_1489654192441=
722201m_-945303160246519398m_7571632583529928415gmail_msg m_346611633987980=
2086m_1489654192441722201gmail_msg gmail_msg"></u><u class=3D"m_34661163398=
79802086m_1489654192441722201m_-945303160246519398m_7571632583529928415gmai=
l_msg m_3466116339879802086m_1489654192441722201gmail_msg gmail_msg"></u></=
p>
<p class=3D"MsoNormal m_3466116339879802086m_1489654192441722201m_-94530316=
0246519398m_7571632583529928415gmail_msg m_3466116339879802086m_14896541924=
41722201gmail_msg gmail_msg">Your explanation that they do not have to and =
there is a more complex process to choosing the changes that will make it i=
nto the next release makes sense to me.</p></div></div></blockquote><div cl=
ass=3D"m_3466116339879802086m_1489654192441722201gmail_msg gmail_msg"><br c=
lass=3D"m_3466116339879802086m_1489654192441722201gmail_msg gmail_msg"></di=
v></span><div class=3D"m_3466116339879802086m_1489654192441722201gmail_msg =
gmail_msg">For Spark ML, Joseph is the de facto leader and does publish a t=
entative roadmap. (We could also use JIRA mechanisms for this but any schem=
e is better than none.) Yes, not based on Votes -- nothing here is. Votes a=
re noisy signal because it is usually measures: what would you like done if=
 you didn&#39;t have to do it and there were no downsides for you?</div><sp=
an class=3D"m_3466116339879802086m_1489654192441722201gmail_msg gmail_msg">=
<div class=3D"m_3466116339879802086m_1489654192441722201gmail_msg gmail_msg=
"><br class=3D"m_3466116339879802086m_1489654192441722201gmail_msg gmail_ms=
g"></div><div class=3D"m_3466116339879802086m_1489654192441722201gmail_msg =
gmail_msg">=C2=A0</div><blockquote class=3D"gmail_quote m_34661163398798020=
86m_1489654192441722201gmail_msg gmail_msg" style=3D"margin:0 0 0 .8ex;bord=
er-left:1px #ccc solid;padding-left:1ex"><div lang=3D"EN-US" link=3D"blue" =
vlink=3D"purple" class=3D"m_3466116339879802086m_1489654192441722201m_-9453=
03160246519398m_7571632583529928415gmail_msg m_3466116339879802086m_1489654=
192441722201gmail_msg gmail_msg"><div class=3D"m_3466116339879802086m_14896=
54192441722201m_-945303160246519398m_7571632583529928415m_64981323200818885=
78WordSection1 m_3466116339879802086m_1489654192441722201m_-945303160246519=
398m_7571632583529928415gmail_msg m_3466116339879802086m_148965419244172220=
1gmail_msg gmail_msg"><p class=3D"MsoNormal m_3466116339879802086m_14896541=
92441722201m_-945303160246519398m_7571632583529928415gmail_msg m_3466116339=
879802086m_1489654192441722201gmail_msg gmail_msg"><u class=3D"m_3466116339=
879802086m_1489654192441722201m_-945303160246519398m_7571632583529928415gma=
il_msg m_3466116339879802086m_1489654192441722201gmail_msg gmail_msg"></u><=
u class=3D"m_3466116339879802086m_1489654192441722201m_-945303160246519398m=
_7571632583529928415gmail_msg m_3466116339879802086m_1489654192441722201gma=
il_msg gmail_msg"></u></p>
<p class=3D"MsoNormal m_3466116339879802086m_1489654192441722201m_-94530316=
0246519398m_7571632583529928415gmail_msg m_3466116339879802086m_14896541924=
41722201gmail_msg gmail_msg"><a name=3D"m_3466116339879802086_m_14896541924=
41722201_m_-945303160246519398_m_7571632583529928415_m_6498132320081888578_=
_MailEndCompose" class=3D"m_3466116339879802086m_1489654192441722201m_-9453=
03160246519398m_7571632583529928415gmail_msg m_3466116339879802086m_1489654=
192441722201gmail_msg gmail_msg">My only humble recommendation would be to =
cleanup the top JIRAs by closing the ones which have spark packages for the=
m (eg the NN one which already has several packages as you explained), noti=
ng or somehow marking
 on some that they will not be resolved, and changing the component on the =
ones not related to ML/MLLIB (eg
</a><a href=3D"https://issues.apache.org/jira/browse/SPARK-12965" class=3D"=
m_3466116339879802086m_1489654192441722201m_-945303160246519398m_7571632583=
529928415gmail_msg m_3466116339879802086m_1489654192441722201gmail_msg gmai=
l_msg" target=3D"_blank"><span class=3D"m_3466116339879802086m_148965419244=
1722201m_-945303160246519398m_7571632583529928415gmail_msg m_34661163398798=
02086m_1489654192441722201gmail_msg gmail_msg">https://issues.apache.org/ji=
ra/browse/SPARK-12965</span><span class=3D"m_3466116339879802086m_148965419=
2441722201m_-945303160246519398m_7571632583529928415gmail_msg m_34661163398=
79802086m_1489654192441722201gmail_msg gmail_msg"></span></a><span class=3D=
"m_3466116339879802086m_1489654192441722201m_-945303160246519398m_757163258=
3529928415gmail_msg m_3466116339879802086m_1489654192441722201gmail_msg gma=
il_msg">).</span></p></div></div></blockquote><div class=3D"m_3466116339879=
802086m_1489654192441722201gmail_msg gmail_msg"><br class=3D"m_346611633987=
9802086m_1489654192441722201gmail_msg gmail_msg"></div></span><div class=3D=
"m_3466116339879802086m_1489654192441722201gmail_msg gmail_msg">We do that.=
 It occasionally generates protests, so, I find myself erring on the side o=
f ignoring. You can comment on any JIRA you think should be closed. That=
9;s helpful.</div><div class=3D"m_3466116339879802086m_1489654192441722201g=
mail_msg gmail_msg"><br class=3D"m_3466116339879802086m_1489654192441722201=
gmail_msg gmail_msg"></div><div class=3D"m_3466116339879802086m_14896541924=
41722201gmail_msg gmail_msg">That particular JIRA seems potentially legitim=
ate. I wouldn&#39;t close it. It also won&#39;t get fixed until someone pro=
poses a resolution. I&#39;d strongly encourage people saying &quot;I have t=
his problem too&quot; to try to fix it. I tend to ignore these otherwise, m=
yself, in favor of reviewing ones where someone has gone to the trouble of =
proposing a working fix.</div><span class=3D"m_3466116339879802086m_1489654=
192441722201gmail_msg gmail_msg"><div class=3D"m_3466116339879802086m_14896=
54192441722201gmail_msg gmail_msg"><br class=3D"m_3466116339879802086m_1489=
654192441722201gmail_msg gmail_msg"></div><div class=3D"m_34661163398798020=
86m_1489654192441722201gmail_msg gmail_msg">=C2=A0</div><blockquote class=
=3D"gmail_quote m_3466116339879802086m_1489654192441722201gmail_msg gmail_m=
sg" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"=
><div lang=3D"EN-US" link=3D"blue" vlink=3D"purple" class=3D"m_346611633987=
9802086m_1489654192441722201m_-945303160246519398m_7571632583529928415gmail=
_msg m_3466116339879802086m_1489654192441722201gmail_msg gmail_msg"><div cl=
ass=3D"m_3466116339879802086m_1489654192441722201m_-945303160246519398m_757=
1632583529928415m_6498132320081888578WordSection1 m_3466116339879802086m_14=
89654192441722201m_-945303160246519398m_7571632583529928415gmail_msg m_3466=
116339879802086m_1489654192441722201gmail_msg gmail_msg"><p class=3D"MsoNor=
mal m_3466116339879802086m_1489654192441722201m_-945303160246519398m_757163=
2583529928415gmail_msg m_3466116339879802086m_1489654192441722201gmail_msg =
gmail_msg"><span class=3D"m_3466116339879802086m_1489654192441722201m_-9453=
03160246519398m_7571632583529928415gmail_msg m_3466116339879802086m_1489654=
192441722201gmail_msg gmail_msg"><u class=3D"m_3466116339879802086m_1489654=
192441722201m_-945303160246519398m_7571632583529928415gmail_msg m_346611633=
9879802086m_1489654192441722201gmail_msg gmail_msg"></u><u class=3D"m_34661=
16339879802086m_1489654192441722201m_-945303160246519398m_75716325835299284=
15gmail_msg m_3466116339879802086m_1489654192441722201gmail_msg gmail_msg">=
</u></span></p>
<p class=3D"MsoNormal m_3466116339879802086m_1489654192441722201m_-94530316=
0246519398m_7571632583529928415gmail_msg m_3466116339879802086m_14896541924=
41722201gmail_msg gmail_msg"><span class=3D"m_3466116339879802086m_14896541=
92441722201m_-945303160246519398m_7571632583529928415gmail_msg m_3466116339=
879802086m_1489654192441722201gmail_msg gmail_msg">Also, I would love to do=
 this if I had the permissions, but it would be great to change the JIRAs t=
hat are marked as =E2=80=9Cin progress=E2=80=9D but where the corresponding=
 pull request was closed/cancelled,
 for example </span><a href=3D"https://issues.apache.org/jira/browse/SPARK-=
4638" class=3D"m_3466116339879802086m_1489654192441722201m_-945303160246519=
398m_7571632583529928415gmail_msg m_3466116339879802086m_148965419244172220=
1gmail_msg gmail_msg" target=3D"_blank"><span class=3D"m_346611633987980208=
6m_1489654192441722201m_-945303160246519398m_7571632583529928415gmail_msg m=
_3466116339879802086m_1489654192441722201gmail_msg gmail_msg">https://issue=
s.apache.org/jira/browse/SPARK-4638</span><span class=3D"m_3466116339879802=
086m_1489654192441722201m_-945303160246519398m_7571632583529928415gmail_msg=
 m_3466116339879802086m_1489654192441722201gmail_msg gmail_msg"></span></a>=
<span class=3D"m_3466116339879802086m_1489654192441722201m_-945303160246519=
398m_7571632583529928415gmail_msg m_3466116339879802086m_148965419244172220=
1gmail_msg gmail_msg">.=C2=A0
 That JIRA is </span></p></div></div></blockquote><div class=3D"m_346611633=
9879802086m_1489654192441722201gmail_msg gmail_msg"><br class=3D"m_34661163=
39879802086m_1489654192441722201gmail_msg gmail_msg"></div></span><div clas=
s=3D"m_3466116339879802086m_1489654192441722201gmail_msg gmail_msg">Yes, fl=
ag these. I or others can close them if appropriate. Anyone who consistentl=
y does this well, we could give JIRA permissions to.</div><div class=3D"m_3=
466116339879802086m_1489654192441722201gmail_msg gmail_msg"><br class=3D"m_=
3466116339879802086m_1489654192441722201gmail_msg gmail_msg"></div><div cla=
ss=3D"m_3466116339879802086m_1489654192441722201gmail_msg gmail_msg">Openin=
g a PR automatically makes it &quot;In Progress&quot; but there&#39;s no co=
mplementary process to un-mark it. You can ignore the Open / In Progress di=
stinction really.</div><div class=3D"m_3466116339879802086m_148965419244172=
2201gmail_msg gmail_msg"><br class=3D"m_3466116339879802086m_14896541924417=
22201gmail_msg gmail_msg"></div><div class=3D"m_3466116339879802086m_148965=
4192441722201gmail_msg gmail_msg">This one is interesting because it does s=
eem like a plausible feature to add. The original PR was abandoned by the a=
uthor and nobody else submitted one -- despite the Votes. I hesitate to sig=
nal that no PRs would be considered, but, doesn&#39;t seem like it&#39;s in=
 demand enough for someone to work on?</div><div class=3D"m_346611633987980=
2086m_1489654192441722201gmail_msg gmail_msg"><br class=3D"m_34661163398798=
02086m_1489654192441722201gmail_msg gmail_msg"></div><div class=3D"m_346611=
6339879802086m_1489654192441722201gmail_msg gmail_msg"><br class=3D"m_34661=
16339879802086m_1489654192441722201gmail_msg gmail_msg"></div><div class=3D=
"m_3466116339879802086m_1489654192441722201gmail_msg gmail_msg">I think one=
 of my messages is that, de facto, here, like in many Apache projects, comm=
itters do not take requests. They pursue the work they believe needs doing,=
 and shepherd work initiated by others (a clear bug report, a PR) to a reso=
lution. Things get done by doing them, or by building influence by doing ot=
her things the project needs doing. It isn&#39;t a mechanical, objective pr=
ocess, and can&#39;t be. But it does work in a recognizable way.</div><bloc=
kquote class=3D"gmail_quote m_3466116339879802086m_1489654192441722201gmail=
_msg gmail_msg" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;paddi=
ng-left:1ex"><div lang=3D"EN-US" link=3D"blue" vlink=3D"purple" class=3D"m_=
3466116339879802086m_1489654192441722201m_-945303160246519398m_757163258352=
9928415gmail_msg m_3466116339879802086m_1489654192441722201gmail_msg gmail_=
msg"><div class=3D"m_3466116339879802086m_1489654192441722201m_-94530316024=
6519398m_7571632583529928415m_6498132320081888578WordSection1 m_34661163398=
79802086m_1489654192441722201m_-945303160246519398m_7571632583529928415gmai=
l_msg m_3466116339879802086m_1489654192441722201gmail_msg gmail_msg"><div c=
lass=3D"m_3466116339879802086m_1489654192441722201m_-945303160246519398m_75=
71632583529928415gmail_msg m_3466116339879802086m_1489654192441722201gmail_=
msg gmail_msg"><div class=3D"m_3466116339879802086m_1489654192441722201m_-9=
45303160246519398m_7571632583529928415gmail_msg m_3466116339879802086m_1489=
654192441722201gmail_msg gmail_msg"><div class=3D"m_3466116339879802086m_14=
89654192441722201m_-945303160246519398m_7571632583529928415gmail_msg m_3466=
116339879802086m_1489654192441722201gmail_msg gmail_msg">
</div>
</div></div></div></div></blockquote></div></div>
</blockquote></div><br class=3D"m_3466116339879802086m_1489654192441722201g=
mail_msg gmail_msg"><br clear=3D"all" class=3D"m_3466116339879802086m_14896=
54192441722201gmail_msg gmail_msg"><div class=3D"m_3466116339879802086m_148=
9654192441722201gmail_msg gmail_msg"><br class=3D"m_3466116339879802086m_14=
89654192441722201gmail_msg gmail_msg"></div></div><div class=3D"gmail_extra=
 m_3466116339879802086m_1489654192441722201gmail_msg gmail_msg">-- <br clas=
s=3D"m_3466116339879802086m_1489654192441722201gmail_msg gmail_msg"><div cl=
ass=3D"m_3466116339879802086m_1489654192441722201m_-945303160246519398gmail=
_signature m_3466116339879802086m_1489654192441722201gmail_msg gmail_msg" d=
ata-smartmail=3D"gmail_signature"><div dir=3D"ltr" class=3D"m_3466116339879=
802086m_1489654192441722201gmail_msg gmail_msg"><p style=3D"font-size:small=
;margin-top:0pt;margin-bottom:0pt" class=3D"m_3466116339879802086m_14896541=
92441722201gmail_msg gmail_msg"><font color=3D"#000000" face=3D"Arial" clas=
s=3D"m_3466116339879802086m_1489654192441722201gmail_msg gmail_msg"><span s=
tyle=3D"font-size:12.6667px;line-height:15.2px;white-space:pre-wrap" class=
=3D"m_3466116339879802086m_1489654192441722201gmail_msg gmail_msg">Joseph B=
radley</span></font></p><p style=3D"font-size:small;margin-top:0pt;margin-b=
ottom:0pt" class=3D"m_3466116339879802086m_1489654192441722201gmail_msg gma=
il_msg"><font color=3D"#000000" face=3D"Arial" class=3D"m_34661163398798020=
86m_1489654192441722201gmail_msg gmail_msg"><span style=3D"font-size:12.666=
7px;line-height:15.2px;white-space:pre-wrap" class=3D"m_3466116339879802086=
m_1489654192441722201gmail_msg gmail_msg">Software Engineer - Machine Learn=
ing</span></font></p><p dir=3D"ltr" style=3D"font-size:12.8px;line-height:1=
.2;margin-top:0pt;margin-bottom:0pt" class=3D"m_3466116339879802086m_148965=
4192441722201gmail_msg gmail_msg"><span style=3D"font-size:12.6667px;font-f=
amily:arial;color:rgb(0,0,0);vertical-align:baseline;white-space:pre-wrap" =
class=3D"m_3466116339879802086m_1489654192441722201gmail_msg gmail_msg">Dat=
abricks, Inc.</span></p><p dir=3D"ltr" style=3D"font-size:12.8px;line-heigh=
t:1.2;margin-top:0pt;margin-bottom:0pt" class=3D"m_3466116339879802086m_148=
9654192441722201gmail_msg gmail_msg"><a href=3D"http://databricks.com/" sty=
le=3D"font-size:12.8px;color:rgb(17,85,204)" class=3D"m_3466116339879802086=
m_1489654192441722201gmail_msg gmail_msg" target=3D"_blank"><img src=3D"htt=
ps://databricks.com/wp-content/uploads/2016/11/db-bug-email-sig-16px.png" w=
idth=3D"16" height=3D"16" alt=3D"http://databricks.com" class=3D"m_34661163=
39879802086m_1489654192441722201gmail_msg gmail_msg"></a><br class=3D"m_346=
6116339879802086m_1489654192441722201gmail_msg gmail_msg"></p></div></div>
</div></blockquote></div></div></div></div></div>
</blockquote></div><br class=3D"gmail_msg"></div>
</blockquote></div></div></div>

--001a114a91ce9641d50549428384--