spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Maciej Szymkiewicz <mszymkiew...@gmail.com>
Subject Re: Handling questions in the mailing lists
Date Thu, 10 Nov 2016 05:52:43 GMT
If you take a look at the statistics
(https://data.stackexchange.com/stackoverflow/query/575406) you'll see
that majority of the unanswered questions:

  * have seen no activity in the last year OR
  * don't have positive score OR
  * have been asked by inactive or new users.

This is usually a good indicator that question is poor quality and / or
abandoned and for different reasons hasn't been picked by the removal
process (https://stackoverflow.com/help/roomba). This is not unusual for
Stack Overflow and with a little bit of organized effort could be
cleaned in a few weeks.

Arguably, for a technology with a large number of moving parts, Spark
has pretty decent /answer rate/ and definitely better than many
comparable projects.

Regarding tagging. Putting community rules aside clean questions which
can be answered with relatively low effort are usually resolved in a few
days. What is left is either to time consuming or complex or just not
not worth the time. If you have a lot of time the former ones can be
easily selected using predefined filters and the rest usually qualifies
for closing.

Still, I believe there is a really important missing point here. All of
that requires a lot of effort and it is slightly unrealistic to expect 
that the number of people willing and having time to contribute will
suddenly grow. So the focus should be on having a knowledge base which
can reduce number of questions to be answered. SO has good visibility,
large number of existing answers, and very good tools. 

On 11/09/2016 08:02 AM, assaf.mendelson wrote:
>
> I like the document and I think it is good but I still feel like we
> are missing an important part here.
>
>  
>
> Look at SO today. There are:
>
> -           4658 unanswered questions under apache-spark tag.
>
> -          394 unanswered questions under spark-dataframe tag.
>
> -          639 unanswered questions under apache-spark-sql
>
> -          859 unanswered questions under pyspark
>
>  
>
> Just moving people to ask there will not help. The whole issue is
> having people answer the questions.
>
>  
>
> The problem is that many of these questions do not fit SO (but are
> already there so they are noise), are bad (i.e. unclear or hard to
> answer), orphaned etc. while some are simply harder than what people
> with some experience in spark can handle and require more expertise.
>
> The problem is that people with the relevant expertise are drowning in
> noise. This. Is true for the mailing list and this is true for SO.
>
>  
>
> For this reason I believe that just moving people to SO will not solve
> anything.
>
>  
>
> My original thought was that if we had different tags then different
> people could watch open questions on these tags and therefore have a
> much lower noise. I thought that we would have a low tier (current
> one) of people just not following the documentation (which would
> remain as noise), then a beginner tier where we could have people
> downvoting bad questions but in most cases the community can answer
> the questions because they are common, then a “medium” tier which
> would mean harder questions but that can still be answered by advanced
> users and lastly an “advanced” tier to which committers can actually
> subscribed to (and adding sub tags for subsystems would improve this
> even more).
>
>  
>
> I was not aware of SO policy for meta tags (the burnination link is
> about removing tags completely so I am not sure how it applies, I
> believe this link
> https://stackoverflow.blog/2010/08/the-death-of-meta-tags/ is more
> relevant).
>
> There was actually a discussion along the lines in SO
> (http://meta.stackoverflow.com/questions/253338/filtering-questions-by-difficulty-level).
>
>  
>
> The fact that SO did not solve this issue, does not mean we shouldn’t
> either.
>
>  
>
> The way I see it, some tags can easily be used even with the meta tags
> limitation. For example, using spark-internal-development tag can be
> used to ask questions for development of spark. There are already tags
> for some spark subsystems (there is a apachae-spark-sql tag, a pyspark
> tag, a spark-streaming tag etc.). The main issue I see and the one we
> can’t seem to get around is dividing between simple questions that the
> community should answer and hard questions which only advanced users
> can answer.
>
>  
>
> Maybe SO isn’t the correct platform for that but even within it we can
> try to find a non meta name for spark beginner questions vs. spark
> advanced questions.
>
> Assaf.
>
>  
>
>  
>
> *From:*Denny Lee [via Apache Spark Developers List]
> [mailto:ml-node+[hidden email]
> </user/SendEmail.jtp?type=node&node=19798&i=0>]
> *Sent:* Tuesday, November 08, 2016 7:53 AM
> *To:* Mendelson, Assaf
> *Subject:* Re: Handling questions in the mailing lists
>
>  
>
> To help track and get the verbiage for the Spark community page and
> welcome email jump started, here's a working document for us to work
> with: https://docs.google.com/document/d/1N0pKatcM15cqBPqFWCqIy6jdgNzIoacZlYDCjufBh2s/edit#
> <https://docs.google.com/document/d/1N0pKatcM15cqBPqFWCqIy6jdgNzIoacZlYDCjufBh2s/edit>
>
>  
>
> Hope this will help us collaborate on this stuff a little faster.  
>
>  
>
> On Mon, Nov 7, 2016 at 2:25 PM Maciej Szymkiewicz <[hidden email]
> </user/SendEmail.jtp?type=node&node=19770&i=0>> wrote:
>
>     Just a couple of random thoughts regarding Stack Overflow...
>
>       * If we are thinking about shifting focus towards SO all
>         attempts of micromanaging should be discarded right in the
>         beginning. Especially things like meta tags, which are
>         discouraged and "burninated"
>         (https://meta.stackoverflow.com/tags/burninate-request/info) ,
>         or thread bumping. Depending on a context these won't be
>         manageable, go against community guidelines or simply obsolete. 
>       * Lack of expertise is unlikely an issue. Even now there is a
>         number of advanced Spark users on SO. Of course the more the
>         merrier.
>
>     Things that can be easily improved:
>
>       * Identifying, improving and promoting canonical questions and
>         answers. It means closing duplicate, suggesting edits to
>         improve existing answers, providing alternative solutions.
>         This can be also used to identify gaps in the documentation.
>       * Providing a set of clear posting guidelines to reduce effort
>         required to identify the problem (think about
>         http://stackoverflow.com/q/5963269 a.k.a How to make a great R
>         reproducible example?)
>       * Helping users decide if question is a good fit for SO (see
>         below). API questions are great fit, debugging problems like
>         "my cluster is slow" are not.
>       * Actively cleaning (closing, deleting) off-topic and low
>         quality questions. The less junk to sieve through the better
>         chance of good questions being answered.
>       * Repurposing and actively moderating SO docs
>         (https://stackoverflow.com/documentation/apache-spark/topics).
>         Right now most of the stuff that goes there is useless,
>         duplicated or plagiarized, or border case SPAM.
>       * Encouraging community to monitor featured
>         (https://stackoverflow.com/questions/tagged/apache-spark?sort=featured)
>         and active & upvoted & unanswered
>         (https://stackoverflow.com/unanswered/tagged/apache-spark)
>         questions.
>       * Implementing some procedure to identify questions which are
>         likely to be bugs or a material for feature requests.
>         Personally I am quite often tempted to simply send a link to
>         dev list, but I don't think it is really acceptable.
>       * Animating Spark related chat room. I tried this a couple of
>         times but to no avail. Without a certain critical mass of
>         users it just won't work.
>
>      
>
>      
>
>     On 11/07/2016 07:32 AM, Reynold Xin wrote:
>
>         This is an excellent point. If we do go ahead and feature SO
>         as a way for users to ask questions more prominently, as
>         someone who knows SO very well, would you be willing to help
>         write a short guideline (ideally the shorter the better, which
>         makes it hard) to direct what goes to user@ and what goes to SO?
>
>      
>
>     Sure, I'll be happy to help if I can.
>
>
>
>
>      
>
>      
>
>     On Sun, Nov 6, 2016 at 9:54 PM, Maciej Szymkiewicz <[hidden email]
>     </user/SendEmail.jtp?type=node&node=19770&i=1>> wrote:
>
>     Damn, I always thought that mailing list is only for nice and
>     welcoming people and there is nothing to do for me here >:)
>
>     To be serious though, there are many questions on the users list
>     which would fit just fine on SO but it is not true in general.
>     There are dozens of questions which are to broad, opinion based,
>     ask for external resources and so on. If you want to direct users
>     to SO you have to help them to decide if it is the right channel.
>     Otherwise it will just create a really bad experience for both
>     seeking help and active answerers. Former ones will be downvoted
>     and bashed, latter ones will have to deal with handling all the
>     junk and the number of active Spark users with moderation
>     privileges is really low (with only Massg and me being able to
>     directly close duplicates).
>
>     Believe me, I've seen this before.
>
>     On 11/07/2016 05:08 AM, Reynold Xin wrote:
>
>         You have substantially underestimated how opinionated people
>         can be on mailing lists too :)
>
>         On Sunday, November 6, 2016, Maciej Szymkiewicz <[hidden
>         email] </user/SendEmail.jtp?type=node&node=19770&i=2>> wrote:
>
>         You have to remember that Stack Overflow crowd (like me) is
>         highly opinionated, so many questions, which could be just
>         fine on the mailing list, will be quickly downvoted and / or
>         closed as off-topic. Just saying...
>
>         -- 
>
>         Best, 
>
>         Maciej
>
>          
>
>         On 11/07/2016 04:03 AM, Reynold Xin wrote:
>
>             OK I've checked on the ASF member list (which is private
>             so there is no public archive).
>
>              
>
>             It is not against any ASF rule to recommend StackOverflow
>             as a place for users to ask questions. I don't think we
>             can or should delete the existing user@spark list either,
>             but we can certainly make SO more visible than it is.
>
>              
>
>              
>
>              
>
>             On Wed, Nov 2, 2016 at 10:21 AM, Reynold Xin <[hidden
>             email] </user/SendEmail.jtp?type=node&node=19770&i=3>> wrote:
>
>             Actually after talking with more ASF members, I believe
>             the only policy is that development decisions have to be
>             made and announced on ASF properties (dev list or jira),
>             but user questions don't have to. 
>
>              
>
>             I'm going to double check this. If it is true, I would
>             actually recommend us moving entirely over the Q&A part of
>             the user list to stackoverflow, or at least make that the
>             recommended way rather than the existing user list which
>             is not very scalable. 
>
>
>
>             On Wednesday, November 2, 2016, Nicholas Chammas <[hidden
>             email] </user/SendEmail.jtp?type=node&node=19770&i=4>> wrote:
>
>             We’ve discussed several times upgrading our communication
>             tools, as far back as 2014 and maybe even before that too.
>             The bottom line is that we can’t due to ASF rules
>             requiring the use of ASF-managed mailing lists.
>
>             For some history, see this discussion:
>
>             ·        
>             https://mail-archives.apache.org/mod_mbox/spark-user/201412.mbox/%3CCAOhmDzfL2COdysV8r5hZN8f=NqXM=f=oY5NO2dHWJ_kVEoP+Ng@...%3E
>             <https://mail-archives.apache.org/mod_mbox/spark-user/201412.mbox/%3CCAOhmDzfL2COdysV8r5hZN8f=NqXM=f=oY5NO2dHWJ_kVEoP+Ng@mail.gmail.com%3E>
>
>             ·        
>             https://mail-archives.apache.org/mod_mbox/spark-user/201501.mbox/%3CCAOhmDzec1JdsXQq3dDwAv7eLnzRidSkrsKKG0xKw=TKTxY_sYw@...%3E
>             <https://mail-archives.apache.org/mod_mbox/spark-user/201501.mbox/%3CCAOhmDzec1JdsXQq3dDwAv7eLnzRidSkrsKKG0xKw=TKTxY_sYw@mail.gmail.com%3E>
>
>             (It’s ironic that it’s difficult to follow the past
>             discussion on why we can’t change our official
>             communication tools due to those very tools…)
>
>             Nick
>
>             ​
>
>              
>
>             On Wed, Nov 2, 2016 at 12:24 PM Ricardo Almeida <[hidden
>             email] </user/SendEmail.jtp?type=node&node=19770&i=5>> wrote:
>
>                 I fell Assaf point is quite relevant if we want to
>                 move this project forward from the Spark user
>                 perspective (as I do). In fact, we're still using 20th
>                 century tools (mailing lists) with some add-ons (like
>                 Stack Overflow).
>
>                  
>
>                 As usually, Sean and Cody's contributions are very to
>                 the point.
>
>                 I fell it is indeed a matter of of culture (hard to
>                 enforce) and tools (much easier). Isn't it?
>
>                  
>
>                 On 2 November 2016 at 16:36, Cody Koeninger <[hidden
>                 email] </user/SendEmail.jtp?type=node&node=19770&i=6>>
>                 wrote:
>
>                 So concrete things people could do
>
>                 - users could tag subject lines appropriately to the
>                 component they're
>                 asking about
>
>                 - contributors could monitor user@ for tags relating
>                 to components
>                 they've worked on.
>                 I'd be surprised if my miss rate for any mailing list
>                 questions
>                 well-labeled as Kafka was higher than 5%
>
>                 - committers could be more aggressive about soliciting
>                 and merging PRs
>                 to improve documentation.
>                 It's a lot easier to answer even poorly-asked
>                 questions with a link to
>                 relevant docs.
>
>
>                 On Wed, Nov 2, 2016 at 7:39 AM, Sean Owen <[hidden
>                 email] </user/SendEmail.jtp?type=node&node=19770&i=7>>
>                 wrote:
>                 > There's already reviews@ and issues@. dev@ is for
>                 project development itself
>                 > and I think is OK. You're suggesting splitting up
>                 user@ and I sympathize
>                 > with the motivation. Experience tells me that we'll
>                 have a beginner@ that's
>                 > then totally ignored, and people will quickly learn
>                 to post to advanced@ to
>                 > get attention, and we'll be back where we started.
>                 Putting it in JIRA
>                 > doesn't help. I don't think this a problem that is
>                 merely down to lack of
>                 > process. It actually requires cultivating a culture
>                 change on the community
>                 > list.
>                 >
>                 > On Wed, Nov 2, 2016 at 12:11 PM Mendelson, Assaf
>                 <[hidden email]
>                 </user/SendEmail.jtp?type=node&node=19770&i=8>>
>                 > wrote:
>                 >>
>                 >> What I am suggesting is basically to fix that.
>                 >>
>                 >> For example, we might say that mailing list A is
>                 only for voting, mailing
>                 >> list B is only for PR and have something like stack
>                 overflow for developer
>                 >> questions (I would even go as far as to have
>                 beginner, intermediate and
>                 >> advanced mailing list for users and
>                 beginner/advanced for dev).
>                 >>
>                 >>
>                 >>
>                 >> This can easily be done using stack overflow tags,
>                 however, that would
>                 >> probably be harder to manage.
>                 >>
>                 >> Maybe using special jira tags and manage it in jira?
>                 >>
>                 >>
>                 >>
>                 >> Anyway as I said, the main issue is not user
>                 questions (except maybe
>                 >> advanced ones) but more for dev questions. It is so
>                 easy to get lost in the
>                 >> chatter that it makes it very hard for people to
>                 learn spark internals…
>                 >>
>                 >> Assaf.
>                 >>
>                 >>
>                 >>
>                 >> From: Sean Owen [mailto:[hidden email]
>                 </user/SendEmail.jtp?type=node&node=19770&i=9>]
>                 >> Sent: Wednesday, November 02, 2016 2:07 PM
>                 >> To: Mendelson, Assaf; [hidden email]
>                 </user/SendEmail.jtp?type=node&node=19770&i=10>
>                 >> Subject: Re: Handling questions in the mailing lists
>                 >>
>                 >>
>                 >>
>                 >> I think that unfortunately mailing lists don't
>                 scale well. This one has
>                 >> thousands of subscribers with different interests
>                 and levels of experience.
>                 >> For any given person, most messages will be
>                 irrelevant. I also find that a
>                 >> lot of questions on user@ are not well-asked,
>                 aren't an SSCCE
>                 >> (http://sscce.org/), not something most people are
>                 going to bother replying
>                 >> to even if they could answer. I almost entirely
>                 ignore user@ because there
>                 >> are higher-priority channels like PRs to deal with,
>                 that already have
>                 >> hundreds of messages per day. This is why little of
>                 it gets an answer -- too
>                 >> noisy.
>                 >>
>                 >>
>                 >>
>                 >> We have to have official mailing lists, in any
>                 event, to have some
>                 >> official channel for things like votes and
>                 announcements. It's not wrong to
>                 >> ask questions on user@ of course, but a lot of the
>                 questions I see could
>                 >> have been answered with research of existing docs
>                 or looking at the code. I
>                 >> think that given the scale of the list, it's not
>                 wrong to assert that this
>                 >> is sort of a prerequisite for asking thousands of
>                 people to answer one's
>                 >> question. But we can't enforce that.
>                 >>
>                 >>
>                 >>
>                 >> The situation will get better to the extent people
>                 ask better questions,
>                 >> help other people ask better questions, and answer
>                 good questions. I'd
>                 >> encourage anyone feeling this way to try to help
>                 along those dimensions.
>                 >>
>                 >>
>                 >>
>                 >>
>                 >>
>                 >>
>                 >>
>                 >>
>                 >>
>                 >>
>                 >>
>                 >> On Wed, Nov 2, 2016 at 11:32 AM assaf.mendelson
>                 <[hidden email]
>                 </user/SendEmail.jtp?type=node&node=19770&i=11>>
>                 >> wrote:
>                 >>
>                 >> Hi,
>                 >>
>                 >> I know this is a little off topic but I wanted to
>                 raise an issue about
>                 >> handling questions in the mailing list (this is
>                 true both for the user
>                 >> mailing list and the dev but since there are other
>                 options such as stack
>                 >> overflow for user questions, this is more
>                 problematic in dev).
>                 >>
>                 >> Let’s say I ask a question (as I recently did).
>                 Unfortunately this was
>                 >> during spark summit in Europe so probably people
>                 were busy. In any case no
>                 >> one answered.
>                 >>
>                 >> The problem is, that if no one answers very soon,
>                 the question will almost
>                 >> certainly remain unanswered because new messages
>                 will simply drown it.
>                 >>
>                 >>
>                 >>
>                 >> This is a common issue not just for questions but
>                 for any comment or idea
>                 >> which is not immediately picked up.
>                 >>
>                 >>
>                 >>
>                 >> I believe we should have a method of handling this.
>                 >>
>                 >> Generally, I would say these types of things belong
>                 in stack overflow,
>                 >> after all, the way it is built is perfect for this.
>                 More seasoned spark
>                 >> contributors and committers can periodically check
>                 out unanswered questions
>                 >> and answer them.
>                 >>
>                 >> The problem is that stack overflow (as well as
>                 other targets such as the
>                 >> databricks forums) tend to have a more user based
>                 orientation. This means
>                 >> that any spark internal question will almost
>                 certainly remain unanswered.
>                 >>
>                 >>
>                 >>
>                 >> I was wondering if we could come up with a solution
>                 for this.
>                 >>
>                 >>
>                 >>
>                 >> Assaf.
>                 >>
>                 >>
>                 >>
>                 >>
>                 >>
>                 >> ________________________________
>                 >>
>                 >> View this message in context: Handling questions in
>                 the mailing lists
>                 >> Sent from the Apache Spark Developers List mailing
>                 list archive at
>                 >> Nabble.com.
>
>                 ---------------------------------------------------------------------
>                 To unsubscribe e-mail: [hidden email]
>                 </user/SendEmail.jtp?type=node&node=19770&i=12>
>
>                  
>
>              
>
>          
>
>      
>
>      
>
>      
>
>     -- 
>
>     Maciej Szymkiewicz
>
>  
>
> ------------------------------------------------------------------------
>
> *If you reply to this email, your message will be added to the
> discussion below:*
>
> http://apache-spark-developers-list.1001551.n3.nabble.com/Handling-questions-in-the-mailing-lists-tp19690p19770.html
>
>
> To start a new topic under Apache Spark Developers List, email [hidden
> email] </user/SendEmail.jtp?type=node&node=19798&i=1>
> To unsubscribe from Apache Spark Developers List, click here.
> NAML
> <http://apache-spark-developers-list.1001551.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>
>
>
> ------------------------------------------------------------------------
> View this message in context: RE: Handling questions in the mailing
> lists
> <http://apache-spark-developers-list.1001551.n3.nabble.com/Handling-questions-in-the-mailing-lists-tp19690p19798.html>
> Sent from the Apache Spark Developers List mailing list archive
> <http://apache-spark-developers-list.1001551.n3.nabble.com/> at
> Nabble.com.

-- 
Maciej Szymkiewicz


Mime
View raw message