spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hyukjin Kwon <>
Subject Re: Crowdsourced triage Scapegoat compiler plugin warnings
Date Thu, 13 Jul 2017 08:16:39 GMT
Hi all,

Another gentle ping for help.

Probably, let me open up a JIRA and proceed this after a couple of weeks if
no one is going to do this although I hope someone takes this.


2017-06-18 2:16 GMT+09:00 Sean Owen <>:

> Looks like a whole lot of the results have been analyzed. I suspect
> there's more than enough to act on already. I think we should wait until
> after 2.2 is done.
> Anybody prefer how to proceed here -- just open a JIRA to take care of a
> batch of related types of issues and go for it?
> On Sat, Jun 17, 2017 at 4:45 PM Hyukjin Kwon <> wrote:
>> Gentle ping to dev for help. I hope this effort is not abandoned.
>> On 25 May 2017 9:41 am, "Josh Rosen" <> wrote:
>> I'm interested in using the Scapegoat
>> <> Scala compiler plugin to find
>> potential bugs and performance problems in Spark. Scapegoat has a useful
>> built-in set of inspections and is pretty easy to extend with custom ones.
>> For example, I added an inspection to spot places where we call
>> *.apply()* on a Seq which is not an IndexedSeq
>> <> in order to make it
>> easier to spot potential O(n^2) performance bugs.
>> There are lots of false-positives and benign warnings (as with any linter
>> / static analyzer) so I don't think it's feasible to us to include this as
>> a blocking step in our regular build. I am planning to build tooling to
>> surface only new warnings so going forward this can become a useful
>> code-review aid.
>> The current codebase has roughly 1700 warnings that I would like to
>> triage and categorize as false-positives or real bugs. I can't do this
>> alone, so here's how you can help:
>>    - Visit the Google Docs spreadsheet at
>>    spreadsheets/d/1z7xNMjx7VCJLCiHOHhTth7Hh4R0F6LwcGjEwCDzrCiM/edit?usp=
>>    sharing
>>    <>
>>    find an un-triaged warning.
>>    - In the columns at the right of the sheet, enter your name in the
>>    appropriate column to mark a warning as a false-positive or as a real bug
>>    and/or performance issue. If think a warning is a real issue then use the
>>    "comments" column for providing additional detail.
>>    - Please don't file JIRAs or PRs for individual warnings; I suspect
>>    that we'll find clusters of issues which are best fixed in a few larger PRs
>>    vs. lots of smaller ones. Certain warnings are probably simply style issues
>>    so we should discuss those before trying to fix them.
>> The sheet has hidden columns capturing the Spark revision and Scapegoat
>> revision. I can use this to programmatically update the sheet and remap
>> lines after updating either Scapegoat (to suppress false-positives) or
>> Spark (to incorporate fixes and surface new warnings). For those who are
>> interested, the sheet was produced with this script: https://gist.github.
>> com/JoshRosen/1ae12a979880d9a98988aa87d70ff2a8
>> Depending on the results of this experiment we might want to integrate a
>> high-signal subset of the Scapegoat warnings into our build. I'm also
>> hoping that we'll be able to build a useful corpus of triaged warnings in
>> order to help improve Scapegoat itself and eliminate common false-positives.
>> Thanks and happy bug-hunting,
>> Josh Rosen

View raw message